Dear DATE community,
We, the DATE Sponsors Committee (DSC) and the DATE Executive Committee (DEC), are deeply shocked and saddened by the tragedy currently unfolding in Ukraine, and we would like to express our full solidarity with all the people and families affected by the war.
Our thoughts also go out to everyone in Ukraine and Russia, whether they are directly or indirectly affected by the events, and we extend our deep sympathy.
We condemn Russia’s military action in Ukraine, which violates international law. And we call on the different governments to take immediate action to protect everyone in that country, particularly including its civilian population and people affiliated with its universities.
Now more than ever, our DATE community must promote our societal values (justice, freedom, respect, community, and responsibility) and confront this situation collectively and peacefully to end this nonsense war.
DATE Sponsors and Executive Committees.
Kindly note that all times on the virtual conference platform are displayed in the user's time zone.
The time zone for all times mentioned at the DATE website is CET – Central Europe Time (UTC+1).
O.1 Opening
Add this session to my calendar
Date: Monday, 14 March 2022
Time: 08:30 CET - 09:15 CET
Session chair:
Cristiana Bolchini, Politecnico di Milano, IT
Session co-chair:
Ingrid Verbauwhede, KU Leuven, BE
Time | Label | Presentation Title Authors |
---|---|---|
08:30 CET | O.1.1 | OPENING Speakers: Cristiana Bolchini1 and Ingrid Verbauwhede2 1Politecnico di Milano, IT; 2KU Leuven - COSIC, BE Abstract DATE 2022 opening |
09:00 CET | O.1.2 | AWARDS Speakers: Donatella Sciuto1, David Atienza2 and Yervant Zorian3 1Politecnico di Milano, IT; 2École Polytechnique Fédérale de Lausanne (EPFL), CH; 3Synopsys, US Abstract DATE 2022 awards presentation |
K.1 Opening keynote #1: "What is beyond AI? Societal opportunities and electronic design automation"
Add this session to my calendar
Date: Monday, 14 March 2022
Time: 09:20 CET - 10:10 CET
Session chair:
Cristiana Bolchini, Politecnico di Milano, IT
The success of hardware in enabling AI acceleration and broadening its scope has been nothing short of remarkable. How do we use the power of hardware design and electronic design automation to instead make the world a better place? EDA will be the cornerstone of innovative solutions in ensuring data privacy, sustainable computing and taming the data flood.
Speaker's bio: Valeria Bertacco is Thurnau Professor of Computer Science and Engineering at the University of Michigan, and Adjunct Professor of Computer Engineering at the Addis Ababa Institute of Technology. Her research interests are in the area of computer design, with emphasis on specialized architecture solutions and design viability, in particular reliability, validation, and hardware-security assurance. Her research endeavors are supported by the Applications Driving Architectures (ADA) Research Center, which Valeria directs. The ADA Center, sponsored by a consortium of semiconductor companies, has the goal of reigniting computing systems design and innovation for the 2030-2040s decades, through specialized heterogeneity, domain-specific language abstractions, and new silicon devices that show benefit to applications. Valeria joined the University of Michigan in 2003. She currently serves as the Vice Provost for Engaged Learning at the University of Michigan, supporting all co-curricular engagements and international partnerships for the institution, and facilitating the work of several central units, whose goals range from promoting environmental sustainability, to the promotion of the arts in research universities, and to increasing the participation of gender minorities in the academy.
Time | Label | Presentation Title Authors |
---|---|---|
09:20 CET | K.1.1 | WHAT IS BEYOND AI? SOCIETAL OPPORTUNITIES AND ELECTRONIC DESIGN AUTOMATION Speaker and Author: Valeria Bertacco, University of Michigan, US Abstract The success of hardware in enabling AI acceleration and broadening its scope has been nothing short of remarkable. How do we use the power of hardware design and electronic design automation to instead make the world a better place? EDA will be the cornerstone of innovative solutions in ensuring data privacy, sustainable computing and taming the data flood. |
10:00 CET | K.1.2 | Q&A SESSION Author: Cristiana Bolchini, Politecnico di Milano, IT Abstract Questions and answers with the speaker |
K.2 Opening keynote #2: "Cryo-CMOS Quantum Control: from a Wild Idea to Working Silicon"
Add this session to my calendar
Date: Monday, 14 March 2022
Time: 10:10 CET - 11:00 CET
Session chair:
Giovanni De Micheli, EPFL, CH
The core of a quantum processor is generally an array of qubits that need to be controlled and read out by a classical processor. This processor operates on the qubits with nanosecond latency, several millions of times per second, with tight constraints on noise and power. This is due to the extremely weak signals involved in the process that require highly sensitive circuits and systems, along with very precise timing capability. We advocate the use of CMOS technologies to achieve these goals, whereas the circuits will be operated at deep-cryogenic temperatures. We believe that these circuits, collectively known as cryo-CMOS control, will make future qubit arrays scalable, enabling a faster growth in qubit count. In the lecture, the challenges of designing and operating complex circuits and systems at 4K and below will be outlined, along with preliminary results achieved in the control and read-out of qubits by ad hoc integrated circuits
Speaker's bio: Edoardo Charbon (SM’00 F’17) received the Diploma from ETH Zurich, the M.S. from the University of California at San Diego, and the Ph.D. from the University of California at Berkeley in 1988, 1991, and 1995, respectively, all in electrical engineering and EECS. He has consulted with numerous organizations, including Bosch, X-Fab, Texas Instruments, Maxim, Sony, Agilent, and the Carlyle Group. He was with Cadence Design Systems from 1995 to 2000, where he was the Architect of the company's initiative on information hiding for intellectual property protection. In 2000, he joined Canesta Inc., as the Chief Architect, where he led the development of wireless 3-D CMOS image sensors. Since 2002 he has been a member of the faculty of EPFL. From 2008 to 2016 he was with Delft University of Technology’s as full professor and Chair of VLSI design. He has been the driving force behind the creation of deep-submicron CMOS SPAD technology, which is mass-produced since 2015 and is present in telemeters, proximity sensors, and medical diagnostics tools. His interests span from 3-D vision, LiDAR, FLIM, FCS, NIROT to super-resolution microscopy, time-resolved Raman spectroscopy, and cryo-CMOS circuits and systems for quantum computing. He has authored or co-authored over 400 papers and two books, and he holds 23 patents. Dr. Charbon is a distinguished visiting scholar of the W. M. Keck Institute for Space at Caltech, a fellow of the Kavli Institute of Nanoscience Delft, a distinguished lecturer of the IEEE Photonics Society, and a fellow of the IEEE.
Time | Label | Presentation Title Authors |
---|---|---|
10:10 CET | K.2.1 | CRYO-CMOS QUANTUM CONTROL: FROM A WILD IDEA TO WORKING SILICON Speaker and Author: Edoardo Charbon, École Polytechnique Fédérale de Lausanne (EPFL), CH Abstract The core of a quantum processor is generally an array of qubits that need to be controlled and read out by a classical processor. This processor operates on the qubits with nanosecond latency, several millions of times per second, with tight constraints on noise and power. This is due to the extremely weak signals involved in the process that require highly sensitive circuits and systems, along with very precise timing capability. We advocate the use of CMOS technologies to achieve these goals, whereas the circuits will be operated at deep-cryogenic temperatures. We believe that these circuits, collectively known as cryo-CMOS control, will make future qubit arrays scalable, enabling a faster growth in qubit count. In the lecture, the challenges of designing and operating complex circuits and systems at 4K and below will be outlined, along with preliminary results achieved in the control and read-out of qubits by ad hoc integrated circuits |
10:50 CET | K.2.2 | Q&A SESSION Author: Giovanni De Micheli, École Polytechnique Fédérale de Lausanne (EPFL), CH Abstract Questions and answers with the speaker |
1.1 Scalable quantum stacks: current status and future prospects
Add this session to my calendar
Date: Monday, 14 March 2022
Time: 11:00 CET - 12:30 CET
Session chair:
Fabio Sebastiano, TU Delft, NL
Session co-chair:
Giovanni De Micheli, EPFL, CH
In this session we explore quantum computing from the quantum algorithm to the qubit, going through the compilation process. In this context, we look at similarities with conventional computing in the overall quantum stack architecture and differences in the control of qubit processors. From these and other perspectives, the session will offer a view into the future of quantum computers.
Time | Label | Presentation Title Authors |
---|---|---|
11:00 CET | 1.1.1 | FULL-STACK QUANTUM COMPUTING SYSTEMS IN THE NISQ ERA: ALGORITHM-DRIVEN AND HARDWARE-AWARE COMPILATION TECHNIQUES Speaker: Carmen G. Almudever, TU Valencia, ES Authors: Medina Bandic1, Sebastian Feld1 and Carmen G. Almudever2 1Delft University of Technology, NL; 2TU Valencia, ES Abstract The progress in developing quantum hardware with functional quantum processors integrating tens of noisy qubits, together with the availability of near-term quantum algorithms has led to the release of the first quantum computers. These quantum computing systems already integrate different software and hardware components of the so- called "full-stack", bridging quantum applications to quantum devices. In this paper, we will provide an overview on current full-stack quantum computing systems. We will emphasize the need for tight co-design among adjacent layers as well as vertical cross-layer design to extract the most from noisy intermediate-scale quantum (NISQ) processors which are both error-prone and severely constrained in resources. As an example of co-design, we will focus on the development of hardware-aware and algorithm-driven compilation techniques. |
11:30 CET | 1.1.2 | TWEEDLEDUM: A COMPILER COMPANION FOR QUANTUM COMPUTING Speaker: Bruno Schmitt, EPFL, CH Authors: Bruno Schmitt and Giovanni De Micheli, École Polytechnique Fédérale de Lausanne (EPFL), CH Abstract This work presents tweedledum—an extensible open-source library aiming at narrowing the gap between high- level algorithms and physical devices by enhancing the expressive power of existing frameworks. For example, it allows designers to insert classical logic (defined at a high abstraction level, e.g., a Python function) directly into quantum circuits. We describe its design principles, concrete implementation, and, in particular, the library’s core: An intuitive and flexible intermediate representation (IR) that supports different abstraction levels across the same circuit structure. |
12:00 CET | 1.2.3 | A CRYO-CMOS TRANSMON QUBIT CONTROLLER AND VERIFICATION WITH FPGA EMULATION Speaker: Kevin Tien, IBM Research, US Authors: Kevin Tien1, Ken Inoue1, Scott Lekuch1, David Frank1, Sudipto Chakraborty1, Pat Rosno2, Thomas Fox1, Mark Yeck1, Joseph Glick1, Raphael Robertazzi1, Ray Richetta2, John Bulzacchelli1, Daniel Ramirez2, Dereje Yilma2, Andy Davies2, Rajiv Joshi1, Devin Underwood1, Dorothy Wisnieff1, Chris Baks1, Donald Bethune3, John Timmerwilke1, Blake Johnson1, Brian Gaucher1 and Daniel Friedman1 1IBM T.J. Watson Research Center, US; 2IBM Systems, US; 3IBM Almaden Research Center, US Abstract Future generations of quantum computers are expected to operate in a paradigm where multi-qubit devices will predominantly perform circuits to support quantum error correction. Highly integrated cryogenic electronics are a key enabling technology to support the control of the large numbers of physical qubits that will be required in this fault-tolerant, error-corrected regime. Here, we describe our perspectives on cryoelectronics-driven qubit control architectures, and will then describe an implementation of a scalable, low-power, cryogenic qubit state controller that includes a domain-specific processor and a SSB upconversion I/Q-mixer-based RF AWG. We will also describe an FPGA-based emulation platform that is able to closely reproduce the system intention, and which was used to verify different aspects of the ASIC system design in in situ transmon qubit control experiments. |
K.3 Lunch Keynote: "Batteries: powering up the next generations"
Add this session to my calendar
Date: Monday, 14 March 2022
Time: 13:10 CET - 14:00 CET
Session chair:
Marco Casale-Rossi, Synopsys, IT
Session co-chair:
Enrico Macii, Politecnico di Torino, IT
The quest for energy possibly from renewable sources is rapidly increasing, due to new digital technologies that are taking up more and more space in our lives, electric vehicles expected to replace old combustion ones. However, today’s battery technology is lagging behind adjacent technological advances, with most devices using lithium-ion batteries, that bring with them some concerns and not the least their availability in Europe. To create a European energy platform for the future, bringing together renewable energy sources, electric transportation and a connected Internet of Things, a new solution for battery technology needs to be found. This keynote will explore how current challenges can be overcome through the application of advances in new materials, what is Europe doing in the field of batteries, the need of skilled people and how the future of battery technology can contribute to build a better, greener and connected world.
Speaker's bio: Silvia Bodoardo is professor at Politecnico di Torino where she is responsible for the task force on batteries and leads the Electrochemistry Group@Polito. Her research activity is mainly focused on the study of materials for Li-ion and post Li-ion batteries. The research is also dealing with cells production and battery testing. She is participating in several EU funded projects (coordinator of STABLE project), as well as national and regional ones. She is leader of WP3 on Education in Battery2030+ initiative and is co-chair in WG3 of BatteRIesEurope. Silvia organized many conferences and workshops on materials with electrochemical application and was Chair woman at the launch of the Horizon Prize on Innovative Batteries.
Time | Label | Presentation Title Authors |
---|---|---|
13:10 CET | K.3.1 | BATTERIES: POWERING UP THE NEXT GENERATIONS Speaker and Author: Silvia Bodoardo, Politecnico di Torino, IT Abstract The quest for energy possibly from renewable sources is rapidly increasing, due to new digital technologies that are taking up more and more space in our lives, electric vehicles expected to replace old combustion ones. However, today’s battery technology is lagging behind adjacent technological advances, with most devices using lithium-ion batteries, that bring with them some concerns and not the least their availability in Europe. To create a European energy platform for the future, bringing together renewable energy sources, electric transportation and a connected Internet of Things, a new solution for battery technology needs to be found. This keynote will explore how current challenges can be overcome through the application of advances in new materials, what is Europe doing in the field of batteries, the need of skilled people and how the future of battery technology can contribute to build a better, greener and connected world. |
13:50 CET | K.3.2 | Q&A SESSION Author: Marco Casale-Rossi, Synopsys, IT Abstract Questions and answers with the speaker |
2.1 Energy-autonomous systems for next generation of IoT
Add this session to my calendar
Date: Monday, 14 March 2022
Time: 14:30 CET - 16:00 CET
Session chair:
Marco Casale-Rossi, Synopsys, IT
Session co-chair:
Giovanni De Micheli, EPFL, CH
Energy autonomous systems hold the promise of perpetual operation for low power sensing systems and next generation of Internet of Things. The key enabling technologies towards this vision are energy harvesting transducers and energy efficient converters including micro power management, energy storage and ultra-low power electronics. The capability of harvesting from the surrounding environment the power required for operation exploits several physical effects and specific energy transducers like electromechanical, thermoelectric, photovoltaic, etc. The limited and intermittent nature of the available power requires dedicated micro-power management circuits for proper interfacing with conventional electronic loads. However, the success of an application and energy-autonomous systems is based on energy-aware and low power design since the beginning. This session will review the main technologies supporting energy autonomous systems, and will focus on advances in micro-power management circuits and successful applications of energy harvesting technologies to achieve the next generation of IoT that exploits perpetual connected intelligent devices.
Time | Label | Presentation Title Authors |
---|---|---|
14:30 CET | 2.1.1 | MICROPOWER MANAGEMENT TECHNIQUES FOR ENERGY HARVESTING APPLICATIONS Speaker and Author: Aldo Romani, University of Bologna, IT Abstract This talk will review the main technologies adopted for energy harvesting with different types of transducersand the types of associated power conversion techniques targeting the most efficient trade-offs between maximum power point tracking, efficiency and internal consumption. Some specific implementations will be reviewed. Finally, the emerging technology trends will be discussed along with application perspectives. |
15:00 CET | 2.1.2 | FULLY SELF-POWERED WIRELESS SENSORS ENABLED BY OPTIMIZED POWER MANAGEMENT MODULES Speaker and Author: Peter Spies, Fraunhofer IIS, DE Abstract The power supply of wireless sensors can be assisted or completely covered by energy harvesting technologies. If a fully self-powered operation by energy harvesting is feasible depends strongly on the ambient conditions, the use-case requirements and the available board space for harvesting building blocks. Besides these aforementioned conditions and requirements, the efficiency of the power supply functional blocks and the system control can play a major role in achieving fully self-powered and unlimited operation time. The talk will introduce building blocks for energy harvesting power supplies to reach the goal of full autonomy. It will also discuss wireless technologies and system control strategies which are of paramount importance in self-powered wireless sensors. Different application examples will illustrate the introduced building blocks and technologies with a focus on condition monitoring and predictive maintenance use cases. |
15:30 CET | 2.1.3 | DESIGN OF SELF-SUSTAINING CONNECTED SMART DEVICES Speaker and Author: Michele Magno, ETH Zürich, CH Abstract Internet of things is a revolutionizing technology which aims to create an ecosystem of connected smart devices and smart sensor providing ubiquitous connectivity between trillions ofdevices. Recent advancements in miniaturization of devices with higher computational capabilities and ultra-low power technology have enabled the vast deployment of sensors with significant changes in hardware design, software, network architecture, data analytic, data storage and power sources. However, the largest portion of IoT devices is still powered by batteries. This talk will focus on the viable solution of harvesting energy from environment and then provide enough energy to the smart devices to achieve self-sustaining smart devices combining, energy harvesting, low power devices, edge computing including machine learning on low power processors and even directly on MEMS sensors to achieve truly self-sustaining smart sensors. |
3.1 Panel: Quantum Software Toolchain
Add this session to my calendar
Date: Monday, 14 March 2022
Time: 16:30 CET - 18:00 CET
Session chair:
Aida Todri Sanial, LIRMM, FR
Session co-chair:
Anne Matsuura, Intel, US
Panellists:
Xin-Chuan (Ryan) Wu, Intel, US
Ali Javadi-Abhari, IBM Research, US
Ross Duncan, Cambridge Quantum Computing / University of Strathclyde, GB
Carmen G. Almudever, TU Valencia, ES
Today’s quantum software toolchains are integral to system-level design of quantum computers. Compilers, system software, qubit simulators, and other software tools are being used to develop and execute quantum workloads and drive architectural research and design of both software and hardware. In this session, industry experts cover the latest software research and development for quantum computing systems.
4.1 Panel: Quantum Hardware
Add this session to my calendar
Date: Tuesday, 15 March 2022
Time: 09:00 CET - 10:30 CET
Session chair:
Anne Matsuura, Intel, US
Session co-chair:
Aida Todri Sanial, LIRMM, FR
Panellists:
Lieven Vandersypen, Delft University of Technology, NL
Lotte Geck, Forschungszentrum Jülich, DE
Steven Brebels, IMEC, BE
Heike Riel, IBM Research, CH
This session highlights recent advancements in qubits and qubit control. Industrial and academic experts present that latest hardware development for quantum computing from materials and qubit devices to qubit control systems.
5.1 Novel Design Techniques for Emerging Technologies in Computing
Add this session to my calendar
Date: Tuesday, 15 March 2022
Time: 11:00 CET - 12:30 CET
Session chair:
Scott Robertson Temple, University of Utah, US
This session is devoted to innovations in design techniques for emerging technologies in computing. The first paper proposes a new security locking scheme based on hybrid CMOS / nanomagnet logic system. The second paper introduces automated methodologies for standard cell design using reconfigurable transistors. The third paper reports advances in complementary FET devices design, which shows promise for sub 5nm nodes. The fourth and last paper presents an industrial RTL-to-GDSII flow for the AQFP superconducting logic family, also discussing novel synthesis opportunities for this technology.
Time | Label | Presentation Title Authors |
---|---|---|
11:00 CET | 5.1.1 | PHYSICALLY & ALGORITHMICALLY SECURE LOGIC LOCKING WITH HYBRID CMOS/NANOMAGNET LOGIC CIRCUITS Speaker: Alexander J. Edwards, University of Texas at Dallas, US Authors: Alexander Edwards1, Naimul Hassan1, Dhritiman Bhattacharya2, Mustafa Shihab1, Peng Zhou1, Xuan Hu1, Jayasimha Atulasimha2, Yiorgos Makris1 and Joseph Friedman1 1University of Texas at Dallas, US; 2Virginia Commonwealth University, US Abstract The successful logic locking of integrated circuits requires that the system be secure against both algorithmic and physical attacks. In order to provide resilience against imaging techniques that can detect electrical behavior, we recently proposed an approach for physically and algorithmically secure logic locking with strain-protected nanomagnet logic (NML). While this NML system exhibits physical and algorithmic security, the fabrication imprecision, noise-related errors, and slow speed of NML incur a significant security overhead cost. In this paper, we therefore propose a hybrid CMOS/NML logic locking solution in which NML islands provide security within a system primarily composed of CMOS, thereby providing physical and algorithmic security with minimal overhead. In addition to describing this proposed system, we also develop a framework for device/system co-design techniques that consider trade-offs regarding the efficiency and security. |
11:20 CET | 5.1.2 | EXPLORING STANDARD-CELL DESIGN FOR RECONFIGURABLE NANOTECHNOLOGIES: A FORMAL APPROACH Speaker: Michael Raitza, TU Dresden, DE Authors: Michael Raitza, Steffen Märcker, Shubham Rai and Akash Kumar, TU Dresden, DE Abstract Standard-cell design has always been a craft and common field-effect transistors span only a narrow design space. This has changed with reconfigurable transistors. Boolean functions that exhibit multiple dual product-terms in their sum- of-product form yield various beneficial circuit implementations with reconfigurable transistors. In this work, we present an approach to automatically generate these implementations through a formal modeling approach. Using the 3-input XOR function as an example, we discuss the variations and show how to quantify properties like worst-case delay and power dissipation, as well as averages of delay and energy consumption per operation over different scenarios. The quantification runs fully automated on charge-transport network models employing probabilistic model checking. This yields exact results instead of approximations obtained from experiments and sampling. Our results show several benefits of reconfigurable transistor circuits over static CMOS implementations. |
11:40 CET | 5.1.3 | DESIGN ENABLEMENT OF CFET DEVICES FOR SUB-2NM CMOS NODES Speaker: Odysseas Zografos, imec, BE Authors: Odysseas Zografos, Bilal Chehab, Pieter Schuddinck, Gioele Mirabeli, Naveen Kakarla, Yang Xiang, Pieter Weckx and Julien Ryckaert, imec, BE Abstract Novel devices that optimize their structure in a three-dimensional fashion and offer significant area gains by reducing standard cell track height are adopted to scale silicon technologies beyond the 5nm node. Such a device is the Complementary FET (CFET), which consists of an n-type channel stacked vertically over a p-type channel. In this paper we review the significant benefits of CFET devices as well as the challenges that arise with their use. More specifically, we focus on the standard cell design challenges as well as the physical implementation ones. We show that to fully exploit the area benefits of the CFET devices, one must carefully select the metal stack used for the physical implementation of a large design. |
12:00 CET | 5.1.4 | MAJORITY-BASED DESIGN FLOW FOR AQFP SUPERCONDUCTING FAMILY Speaker: Giulia Meuli, Synopsys, IT Authors: Giulia Meuli1, Vinicius Possani2, Rajinder Singh2, Siang-Yun Lee3, Alessandro Tempia Calvino4, Dewmini Marakkalage4, Patrick Vuillod5, Luca Amarù6, Scott Chase6, Jamil Kawa7 and Giovanni De Micheli8 1Synopsys, IT; 2Synopsys Inc., US; 3École Polytechnique Fédérale de Lausanne, CH; 4EPFL, CH; 5Synopsys Inc., FR; 6Synopsys Inc, US; 7Synopsys, Inc., US; 8École Polytechnique Fédérale de Lausanne (EPFL), CH Abstract Adiabatic superconducting devices are promising candidates to develop high-speed/low-power electronics. Advances in physical technology must be matched with a systematic development of comprehensive design and simulation tools to bring superconducting electronics to a commercially viable state. Being the technology fundamentally different from CMOS, new challenges are posed to design automation tools: library cells are controlled by multi-phase clocks, they implement the majority logic function, and they have limited fanout. We present a product-level RTL-to-GDSII flow for the design of Adiabatic Quantum-Flux-Parametron (AQFP) electronic circuits, with a focus on the special techniques used to comply with these challenges. In addition, we demonstrate new optimization opportunities for graph matching, resynthesis, and buffer/splitter insertion, improving the state-of-the-art. |
K.4 Lunch Keynote: "AI in the edge; the edge of AI"
Add this session to my calendar
Date: Tuesday, 15 March 2022
Time: 13:10 CET - 14:00 CET
Session chair:
Gi-Joon Nan, IBM, US
Session co-chair:
Marian Verhelst, KU Leuven, BE
In the world of IoT, both humans and objects are continuously connected, collecting and communicating data, in a rising number of applications including industry 4.0, biomedical, environmental monitoring, smart houses and offices. Local computation in the edge has become a necessity to limit data traffic. Additionally embedding AI processing in the edge adds potentially high levels of smart autonomy to these IoT 2.0 systems. Progress in nanoelectronic technology allows to do this in power- and hardware-efficient architectures and designs. This keynote gives an overview of key solutions, but also describes main limitations and risks, exploring the edge of edge AI.
Speaker's bio: Georges G.E. Gielen received the MSc and PhD degrees in Electrical Engineering from the Katholieke Universiteit Leuven (KU Leuven), Belgium, in 1986 and 1990, respectively. He currently is Full Professor in the MICAS research division at the Department of Electrical Engineering (ESAT) at KU Leuven. From August 2013 till July 2017 he was also appointed at KU Leuven as Vice-Rector for the Group of Sciences, Engineering and Technology, and he was also responsible for academic Human Resource Management. He was visiting professor in UC Berkeley and Stanford University. Since 2020 he is Chair of the Department of Electrical Engineering. His research interests are in the design of analog and mixed-signal integrated circuits, and especially in analog and mixed-signal CAD tools and design automation. He is a frequently invited speaker/lecturer and coordinator/partner of several (industrial) research projects in this area, including several European projects. He has (co-)authored 10 books and more than 600 papers in edited books, international journals and conference proceedings. He is a 1997 Laureate of the Belgian Royal Academy of Sciences, Literature and Arts in the discipline of Engineering. He is Fellow of the IEEE since 2002, and received the IEEE CAS Mac Van Valkenburg award in 2015 and the IEEE CAS Charles Desoer award in 2020. He is an elected member of the Academia Europæa.
Time | Label | Presentation Title Authors |
---|---|---|
13:10 CET | K.4.1 | AI IN THE EDGE; THE EDGE OF AI Speaker and Author: Georges Gielen, KU Leuven, BE Abstract In the world of IoT, both humans and objects are continuously connected, collecting and communicating data, in a rising number of applications including industry 4.0, biomedical, environmental monitoring, smart houses and offices. Local computation in the edge has become a necessity to limit data traffic. Additionally embedding AI processing in the edge adds potentially high levels of smart autonomy to these IoT 2.0 systems. Progress in nanoelectronic technology allows to do this in power- and hardware-efficient architectures and designs. This keynote gives an overview of key solutions, but also describes main limitations and risks, exploring the edge of edge AI. |
13:50 CET | K.4.2 | Q&A SESSION Author: Gi-Joon Nam, IBM Research, US Abstract Questions and answers with the speaker |
6.1 Alternative design paradigms for sustainable IoT nodes
Add this session to my calendar
Date: Tuesday, 15 March 2022
Time: 14:30 CET - 16:00 CET
Session chair:
David Atienza, EPFL, CH
Session co-chair:
Ayse Coskun, Boston University, US
While the potential influence of AI in the context of IoT in our daily life is enormous, there are significant challenges related to the ethics and interpretability of AI results, as well as ecological implications on system design for deep learning technologies. This special session investigates how the progress in AI technologies can be combined with alternative design paradigms for smart nodes so that the future of IoT can be nurtured and cultivated on a sustainable way for the benefit of society.
Time | Label | Presentation Title Authors |
---|---|---|
14:30 CET | 6.1.1 | BIO-INSPIRED ENERGY EFFICIENT ALL-SPIKING INTERNET OF THINGS NODES Speaker: Adrian M. Ionescu, EPFL, CH Author: Adrian Ionescu, EFPL, CH Abstract In this talk we will present bio-inspired innovations exploiting phase change and ferroelectric materials and devices for all-spiking IoT nodes and Edge AI event detection applications. Particularly, we will report new progress in (i) electromagnetic and optical spiking sensors based on vanadium dioxides, and, (ii) ferroelectric neurons and synapses built with doped high-k dielectrics on 2D semiconducting materials. The future implications for improving the energy efficiency of IoT nodes will be discussed. |
15:00 CET | 6.1.2 | HYBRID DIGITAL-ANALOG SYSTEMS-ON-CHIP FOR EFFICIENT EDGE AI Speaker: Marian Verhelst, KU Leuven, BE Authors: Marian Verhelst1, Kodai Ueyoshi1, Giuseppe Sarda1, Pouya Houshmand1, Ioannis Papistas2, Vikram Jain1, Man Shi1, Peter Vrancx3, Debjyoti Bhattacharjee3, Stefan Cosemans2, Arindam Mallik3 and Peter Debacker3 1KU Leuven, BE; 2Imec and Axelera, BE; 3imec, BE Abstract Deep inference workloads at the edge are characterized by a wide variety of neural network layer topologies and characteristics. While large convolutional layers execute very efficiency on the dense compute-in-memory co-processors appearing in literature, other layer types (grouped convolutions, layers with low channel count or high precision requirements) benefit from digital execution. This talk discusses a new breed of heterogeneous SoC’s, integrating co-processors of different nature into a common processing systems with tightly coupled shared memory, to be able to dispatch every layer to the most optimal accelerator. |
15:30 CET | 6.1.3 | 3D COMPUTE CUBES FOR EDGE INTELLIGENCE: NANOELECTRONIC-ENABLED ADAPTIVE SYSTEMS BASED ON JUNCTIONLESS, AMBIPOLAR, AND FERROELECTRIC VERTICAL FETS Speaker: Ian O'Connor, Lyon Institute of Nanotechnology, FR Authors: Ian O'Connor1, David Atienza2, Jens Trommer3, Oskar Baumgartner4, Guilhem Larrieu5 and Cristell Maneux6 1Lyon Institute of Nanotechnology, FR; 2École Polytechnique Fédérale de Lausanne (EPFL), CH; 3Namlab gGmbH, DE; 4Global TCAD Solutions, AT; 5LAAS – CNRS, FR; 6University of Boedeaux, FR Abstract New computing paradigms and technologies are required to respond to the challenges of data-intensive edge intelligence. We propose a triple combination of emerging technologies for the fine interweaving of versatile logic functionality and memory for reconfigurable in-memory computing: vertical junctionless gate-all-around nanowire transistors for ultimate downscaling; ambipolar functionality enhancement for fine-grain flexibility; ferroelectric oxides for non-volatile logic operation. Through a DTCO approach, this talk will describe the design of 3D compute cubes naturally suited to the hardware acceleration of computation-intensive kernels, as well as their integration into computing systems, introducing a system-wide exploration framework to assess their effectiveness. HW/SW optimization will also be described with a focus on Transformer and Conformer networks and the matrix multiplication kernel, which dominates their run-time. |
7.1 Panel: Autonomous Systems Design as a Research Challenge
Add this session to my calendar
Date: Tuesday, 15 March 2022
Time: 16:30 CET - 18:00 CET
Session chair:
Selma Saidi, TU Dortmund, DE
Session co-chair:
Rolf Ernst, TU Braunschweig, DE
Panellists:
Karl-Erik Arzen, Lund University, SE
Peter Liggesmeyer, Fraunhofer Institute for Experimental Software Engineering IESE, DE
Axel Jantsch, TU Wien, AT
Autonomous systems require specific design methods that leave behavioral freedom and plan for the unexpected without losing trustworthiness and dependability. How does this requirement influence research at major research institutions? How is it reflected in public funding? Should autonomous systems design become a new discipline or should the regular design process be adapted to handle autonomy? The panel will begin with position statements by the panelists, followed by an open discussion with the hybrid audience.
8.1 Young People Program: Career Fair
Add this session to my calendar
Date: Wednesday, 16 March 2022
Time: 16:00 CET - 17:00 CET
Session chair:
Anton Klotz, Cadence, DE
Session co-chair:
Xavier Salazar, Barcelona Supercomputing Center & HiPEAC, ES
The Career Fair aims at bringing together Ph.D. Students and potential job seekers with recruiters from EDA and microelectronic companies. In this slot, sponsoring companies present themselves to jobseekers and to the DATE community.
Time | Label | Presentation Title Authors |
---|---|---|
16:00 CET | 8.1.1 | INTRODUCTION TO THE CAREER FAIR Speaker and Author: Anton Klotz, Cadence Design Systems, DE Abstract Introduction to Career Fair. How to apply for listed positions |
16:10 CET | 8.1.2 | CADENCE DESIGN SYSTEMS Speaker and Author: Anton Klotz, Cadence Design Systems, DE Abstract Introducing Cadence Design Systems as employer for young talents |
16:17 CET | 8.1.3 | IMMS Speaker and Author: Eric Schaefer, IMMS, DE Abstract Introducing IMMS as employer for young talents |
16:23 CET | 8.1.4 | SIEMENS EDA Speaker and Author: Janani Muruganandam, Siemens, NL Abstract Introducing Siemens EDA as employer for young talents |
16:30 CET | 8.1.5 | SYNOPSYS Speaker and Author: Markus Wedler, Synopsys, DE Abstract Introducing Synopsys as employer for young talents |
16:37 CET | 8.1.6 | ANSYS Speaker and Author: Helene Tabourier, Ansys, DE Abstract Introducing Ansys as employer for young talents |
16:43 CET | 8.1.7 | INTEL Speaker and Author: Pablo Herrero, INTEL, DE Abstract Introducing Intel as employer for young talents |
16:50 CET | 8.1.8 | BOSCH Speaker and Author: Atefe Dalirsani, BOSCH, DE Abstract Introducing Bosch as employer for young talents |
9.1 Young People Program: Sponsorship Fair
Add this session to my calendar
Date: Wednesday, 16 March 2022
Time: 17:00 CET - 18:30 CET
Session chair:
Sara Vinco, Politecnico di Torino, IT
Session co-chair:
Anton Klotz, Cadence, DE
The Sponsorship Fair aims at bringing together University Student Teams involved in international competitions and personnel from EDA and microelectronic companies. In this slot, Student Teams present their activities, success stories and challenges in front of the DATE audience and to sponsoring companies, to build new collaborations.
Time | Label | Presentation Title Authors |
---|---|---|
17:00 CET | 9.1.1 | DUTCH NAO TEAM Speaker and Author: Thomas Wiggers, University of Amsterdam, NL Abstract Dutch Nao Team is a team of bachelor and master students from the University of Amsterdam that program robots to play football autonomously. Dutch Nao Team competes in the RoboCup SPL League and competitions around the world. |
17:10 CET | 9.1.2 | SQUADRA CORSE POLITO Speaker and Author: Enrico Salvatore, Politecnico di Torino, IT Abstract Squadra Corse PoliTO is the Formula SAE team of the Politecnico di Torino. The team is entirely run by students of the Politecnico di Torino who design, manufacture, test, and compete with formula style race cars in the Formula Student competitions. The team qualified for all the major Formula SAE student competitions of the 2021-2022 season. |
17:20 CET | 9.1.3 | DYNAMIS PRC Speaker and Author: Ishac Oursana, Politecnico di Milano, IT Abstract Dynamics PRC is the Formula Student team of Politecnico di Milano. Originally working on Combustion engines, the Dynamics PRC Teams also works on Electric prototypes and autonomous driving. Dynamics PRC classified 1st in Overall FSN and Overall FSATA in 2019. |
17:30 CET | 9.1.4 | HYPED Speaker and Author: Marina Antonogiannaki, University of Edinburgh, GB Abstract HYPED is the Edinburgh University Hyperloop Team. HYPED co-organises the European Hyperloop Week to promote the development of Hyperloop and connect students with the industry. HYPED has been among the finalists of SpaceX Hyperloop Pod Competition from 2017 to 2019 and won the Virgin Hyperloop One Global Challenge |
17:40 CET | 9.1.5 | ONELOOP AT UC DAVIS Speaker and Author: Zbynka Kekula, UC Davis, US Abstract "OneLoop is a student run organization of UC Davis working on developing a Hyperloop pod. Since the first SpaceX competition in 2017they continued to excel in competitions and in furthering HyperLoop research." |
17:50 CET | 9.1.6 | NEUROTECH LEUVEN Speaker and Author: Jonah Van Assche, KU Leuven, BE Abstract NeuroTech Leuven is a team of students from KU Leuven, Belgium that are interested in all things “Neuro”, ranging from neuroscience to neurotechnology. The NeureTech team takes part to the NeuroTechX Competition. |
18:00 CET | 9.1.7 | Q&A SESSION Authors: Sara Vinco1 and Anton Klotz2 1Politecnico di Torino, IT; 2Cadence Design Systems, DE Abstract This poster session allows a closer interaction of student teams with EDA and microelectronic companies, to allow discussion on sponsorship opportunities, e.g., in terms on monetary sponsorships, licenses, tutorials. |
10.1 PhDForum
Add this session to my calendar
Date: Wednesday, 16 March 2022
Time: 18:30 CET - 20:30 CET
Session chair:
Gabriela Nicolescu, École Polytechnique de Montréal, CA
Session co-chair:
Mahdi Nikdast, Colorado State University, US
The PhD Forum is an online poster session hosted by EDAA, ACM-SIGDA, and IEEE CEDA for PhD students who have completed their PhD thesis within the last 12 months or who are close to complete their thesis work. It represents an excellent opportunity for them to get feedback on their research and for the industry to get a glance of state-of-the-art in system design and design automation.
Time | Label | Presentation Title Authors |
---|---|---|
18:30 CET | 10.1.1 | NOVEL ATTACK AND DEFENSE STRATEGIES FOR ENHANCED LOGIC LOCKING SECURITY Speaker: Lilas Alrahis, New York University Abu Dhabi, AE Authors: Lilas Alrahis1 and Hani Saleh2 1New York University Abu Dhabi, AE; 2Khalifa University, AE Abstract The globalized and, thus, distributed semiconductor supply chain creates an attack vector for the untrusted entities in stealing the intellectual property (IP) of a design. To ward off the threat of IP theft, researchers developed various countermeasures like state-space obfuscation, split manufacturing, and logic locking (LL). LL is a holistic design-for-trust technique that aims to protect the design IP from untrustworthy entities throughout the IC supply chain, by locking the functionality of the design. State-of-the-art LL solutions such as provably secure logic locking (PSLL) and scan locking/obfuscation aim to offer protection against immediate attacks such as the Boolean satisfiability (SAT)-based attack. However, these implementations mostly focus on thwarting the SAT-based attack leaving them vulnerable to other unexplored threats. The underlying research objective of this Ph.D. work is enhancing the security of LL by exposing and then addressing its security vulnerabilities. |
18:30 CET | 10.1.2 | PROPER ABSTRACTIONS FOR DIGITAL ELECTRONIC CIRCUITS: A PHYSICALLY GUIDED APPROACH Speaker: Jurgen Maier, TU Wien, AT Author: Jürgen Maier, TU Wien, AT Abstract In this thesis I show that developing abstractions, which are able to describe the behavior of digital electronic circuits in a simple yet accurate fashion, can be efficiently guided by identifying the underlying physical processes. Based on transistor-level analysis, analog SPICE simulations and even formal proofs I thus provide approximations of the analog signal trajectories inside a circuit and of the signal propagation delay in the digital domain. In addition I introduce methods for an efficient characterization of the Schmitt-Trigger, including its metastable and dynamic behavior. Overall, the developed abstractions are highly faithful in regard to the fact that only physically reasonable behavior can be modeled and vice versa. This leads to more powerful, accurate and trustworthy results which allows one to identify problematic spots in a circuit with higher confidence in less time. Nevertheless, no "silver bullet" w.r.t modeling abstractions could be found, meaning that each abstractions requires careful analysis of the physical behavior to achieve the optimal performance, accuracy and coverage. |
18:30 CET | 10.1.3 | RETRAINING-FREE WEIGHT-SHARING FOR CNN COMPRESSION Speaker: Etienne Dupuis, Lyon Institute of Nanotechnology, FR Authors: Etienne Dupuis1, David Novo2, Alberto Bosio3 and Ian O'Connor3 1Institut des Nanotechnologies de Lyon, FR; 2CNRS, LIRMM, University of Montpellier, FR; 3Lyon Institute of Nanotechnology, FR Abstract The Weight-Sharing (WS) technique gives promising results in compressing Convolutional Neural Networks (CNNs), but it requires the careful determining of the shared values for each layer of a given CNN. The WS Design Space Exploration (DSE) time can easily explode for state-of-the-art CNNs. We propose a new heuristic approach to drastically reduce the exploration time without sacrificing the quality of the output. The results carried out on recent CNNs GoogleNet, ResNet50V2, MobileNetV2, InceptionV3, and EfficientNet), trained with the ImageNet dataset, show over 5× memory compression at an acceptable accuracy loss (complying with the MLPerf quality target) without any retraining step. Index Terms—Convolutional Neural Network, Deep Learning, Computer vision, Hardware Accelerator, Design Space Explo- ration, Approximate Computing, Weight-Sharing |
18:30 CET | 10.1.4 | INTELLIGENT CIRCUIT DESIGN AND IMPLEMENTATION WITH MACHINE LEARNING IN EDA Speaker and Author: Zhiyao Xie, Duke University, US Abstract EDA (Electronic Design Automation) technology has achieved remarkable progress over the past decades, from attaining merely functionally correct designs to handling multi-million-gate circuits. However, chip design is not completely automatic yet in general and the gap is not easily surmountable. For example, automation of EDA flow is still largely restricted to individual point tools with little interplay across different tools and design steps. Tools in early steps cannot well judge if their solutions may eventually lead to satisfactory designs, and the consequence of a poor solution cannot be found until very late. A major weakness of these traditional EDA technologies is the insufficient prior design knowledge reuse. Conventional optimization techniques construct solutions from scratch even if similar optimizations have already been performed, perhaps even repeatedly. Predictive models are either inaccurate or dependent on trial designs, which are very time- and resource-consuming. These limitations point to a major strength of machine learning (ML) – the capability to explore highly complex correlations between two design stages based on prior data. During my Ph.D. study, I construct multiple fast yet accurate models for various design objectives in EDA with customized ML algorithms. |
18:30 CET | 10.1.5 | CROSS-LAYER TECHNIQUES FOR ENERGY-EFFICIENCY AND RESILIENCY OF ADVANCED MACHINE LEARNING ARCHITECTURES Speaker: Alberto Marchisio, TU Wien, AT Authors: Alberto Marchisio1 and Muhammad Shafique2 1TU Wien (TU Wien), AT; 2New York University Abu Dhabi, AE Abstract Machine Learning (ML) algorithms have shown high level of accuracy in several tasks, therefore ML-based applications are widely used in many systems and platforms. However, the development of efficient ML-based systems requires addressing two key research problems: energy-efficiency and security. Current trends show the growing interest in the community for complex ML models, such as Deep Neural Networks (DNNs), Capsule Networks (CapsNets), Spiking Neural Networks (SNNs). Besides their high learning capabilities, their complexity pose several challenges to address the above-discussed research problems. In this work, we explore cross-layer concepts to engage both hardware and software-level techniques to build resilient and energy-efficient architectures for these networks. |
18:30 CET | 10.1.6 | DESIGN & ANALYSIS OF AN ON-CHIP PROCESSOR FOR THE AUTISM SPECTRUM DISORDER (ASD) CHILDREN ASSISTANCE USING THEIR EMOTIONS Speaker: Abdul Rehman Aslam, Lahore University of Management Sciences, PK Authors: Abdul Rehman Aslam and Muhammad Awais Bin Altaf, Lahore University of Management Sciences, Pakistan, PK Abstract Autism Spectrum Disorder (ASD) is a neurological disorder that affects the cognitive and emotional abilities of children. The number of ASD patients has increased drastically in the past decade. The world health organization estimates that around 1 out of every 160 children is an ASD patient in the United States. The actual number of patients may be substantially higher as many patients are not reported due to the stigma associated with the ASD diagnosis methods. The ASD statistics can be more severe in underdeveloped and 3rd world countries with a lack of basic health facilities for a major population. The conventional Autism Diagnosis Observation Schedule (ADOS-2) diagnosis methods require extensive behavioral evaluations and require frequent visits of the children to the neurologists. These extensive evaluations lead to late diagnosis and hence late treatment. The chronic ailment of the central nervous system in ASD causes the degradation of emotional and cognitive abilities. The ASD patients suffer from attention deficit hyperactivity disorder, memory issues, inability to take decisions, emotional issues, and lack of self-control. The lack of self-control is overriding in their emotions. They have highly imbalanced emotions and face certain negative emotional outbursts (NEOB). The NEOB’s are impulses of negative emotions causing self-injuries and suicide attempts leading to death. The long-term continuous monitoring with neurofeedback of human emotions is therefore crucial for ASD patients. The timely prediction of NEOB’s is crucial in mitigating its harmful effect. The emotions prediction can be used to regulate the emotions by controlling these NEOB’s. This need can be addressed by Electroencephalography (EEG) based non-invasive, real-time and continuous emotion’s prediction system on chip (SoC) embedded inside some headband. This work targets the design and analysis of the digital backend (DBE) processor for a fully integrated wearable emotion prediction SoC. The SoC involves an analog front (AFE) for EEG data acquisition and a DBE processor for the emotion’s prediction. The miniaturized low-power processor can be embedded in a headband (patch sensor) for the timely prediction of NEOB’s. An SoC that predicts the NEOB’s and records its pattern was designed and implemented in 0.18µm 1P6M CMOS process. The dual-channel deep neural network (DNN) based emotions classification processor utilizes only two EEG channels for the emotion’s classification. The lowest number of channels minimizes the patient’s discomfort while wearing the headband SoC. The DBE classification processor utilizes only two features per channel to minimize the area and power and overcome overfitting problems. The proposed approximated skewness indicator feature was implemented using 86X lower area (gate count) after tuning the conventional mathematical formula for skewness. The DNN classifier was implemented in a semi pipelined manner after instructions rescheduling and customized arithmetic and logic unit implementation with 34X lower area (gate count). The sigmoid activation function was implemented with 50% lower memory resources due to symmetry between positive and negative sigmoid values. The overall area efficiency of 71% was achieved for the DNN classification unit. The 16mm2 SoC is implemented in 0.18um 1P6M, CMOS process and consumes 10.13μJ/classification for 2 channel operation while achieving an average accuracy of >85% on multiple emotion databases and real-time testing. The DBE processor for the wearable non-invasive emotions classification system was fabricated using 0.18µm CMOS process. The processor has an overall energy efficiency of 10.13µJ per classification. This is the world’s first SoC for emotions prediction targeting ASD patients with minimal hardware resources. The SoC can also be used for the ASD prediction with an excellent classification accuracy of 95%. |
18:30 CET | 10.1.7 | RESILIENCE AND ENERGY-EFFICIENCY FOR DEEP LEARNING AND SPIKING NEURAL NETWORKS FOR EMBEDDED SYSTEMS Speaker: Rachmad Vidya Wicaksana Putra, TU Wien, AT Authors: Rachmad Vidya Wicaksana Putra1 and Muhammad Shafique2 1TU Wien, AT; 2New York University Abu Dhabi, AE Abstract Neural networks (NNs) have become prominent machine learning (ML) algorithms because they achieve state-of-the-art accuracy for various data analytic applications, such as object recognition, healthcare, and autonomous driving. However, deploying the advanced NN algorithms, such as deep neural networks (DNNs) and spiking neural networks (SNNs), to the resource-constrained embedded systems is challenging because of their memory- and compute-intensive nature. Moreover, the existing SNN-based systems still cannot adapt to dynamic operating environments that make the offline-learned knowledge obsolete, and suffer from the negative impact of hardware-induced faults, thereby degrading the accuracy. Therefore, in this PhD work, we explore cross-layer hardware (HW)- and software (SW)-level techniques for building resilient and energy-efficient NN-based systems to enable their deployment for embedded applications in a reliable manner under diverse operating conditions. |
18:30 CET | 10.1.8 | MODELING AND OPTIMIZATION OF EMERGING AI ACCELERATORS UNDER RANDOM UNCERTAINTIES Speaker and Author: Sanmitra Banerjee, Duke University, US Abstract Artificial intelligence (AI) accelerators based on carbon nanotube FETs (CNFETs) and silicon-photonic neural networks (SPNNs) enable ultra-low-energy and ultra-high-speed matrix multiplication. However, these emerging technologies are susceptible to inevitable fabrication-process variations and manufacturing defects. My Ph.D. dissertation focuses on the development of a comprehensive modeling framework to analyze such uncertainties and their impact on emerging AI accelerators. We show that the nature of uncertainties in CNFETs and SPNNs differs from that in Si CMOS circuits and as such, the application and effectiveness of conventional EDA and test approaches is significantly restricted when applied to such emerging technologies. To address this, we also propose several novel technology-aware design optimization and test generation methods to facilitate yield ramp-up of next-generation AI accelerators. |
18:30 CET | 10.1.9 | LOGIC SYNTHESIS IN THE MACHINE LEARNING ERA: IMPROVING CORRELATION AND HEURISTICS Speaker: Walter Lau Neto, University of Utah, US Authors: Walter Lau Neto and Pierre-Emmanuel Gaillardon, University of Utah, US Abstract This extended abstract proposes to explore current advances in extit{Machine Learning} (ML) techniques to enhance both abstraction and heuristics in Logic Synthesis. We start by proposing a extit{Convolutional Neural Network} (CNN) model to predict early in the flow post extit{Place & Route} (PnR) critical paths, and a method to use this information and optimize these paths, achieving 15.3\% improvement in ADP and 18.5\% improvement in EDP. We also present a CNN model to be used during technology-mapping, that presents a novel cut-pruning policy, improving the mapping delay by an average of 10\% when compared to the ABC tool, the state-of-the-art open source technology mapper, at a cost of 2\% area. Our model for technology mapping replaces a core heuristic, which to the best of our knowledge is a novel contribution. Most of previous work for ML in EDA use ML to forecast metrics and tune the flow, but not embedded as a core heuristic. |
18:30 CET | 10.1.10 | ACCELERATING CNN INFERENCE NEAR TO THE MEMORY BY EXPLOITING PARALLELISM, SPARSITY, AND REDUNDANCY Speaker: Palash Das, Indian Institute of Technology, Guwahati, IN Authors: Palash Das and Hemangee Kapoor, Indian Institute of Technology, Guwahati, IN Abstract Convolutional Neural Networks (CNNs) have become a promising tool for deep learning, specifically in the domain of computer vision. Deep CNNs have widespread use in real-life applications like image classification, object detection, and image segmentation. The inference phase of CNNs is often used in real-time for faster prediction and classification and hence seeks high performance and energy efficiency from the system. Towards designing such systems, we implement multiple strategies that make the real-time inference exceptionally faster in exchange for minimum area/power overhead. We implement multiple custom accelerators with various capabilities and integrate them closer to the main memory to reduce the memory access latency/energy using the near-memory processing (NMP) concept. In our first contribution, we design custom hardware, convolutional logic unit (CLU), and integrate them close to a 3D memory, specifically hybrid memory cube (HMC). We propose a dataflow that helps in parallelizing the CNN tasks for their concurrent execution. In the second contribution, we propose an architecture that leverages the benefits of using NMP using HMC, exploiting parallelism and data sparsity. In the third contribution, apart from NMP and parallelism, the proposed hardware can also remove the redundant multiplications of inference by a lookaside memory (LAM)-based search technique. This makes the inference substantially faster because of the reduced number of costly multiplication operations. And lastly, we investigate the efficacy of NMP with the conventional DRAM while accelerating the inference. While implementing NMP in DRAM, we also explore the design space with our designed hardware modules based on the parameters like performance, power consumption, and area overhead. |
18:30 CET | 10.1.11 | DESIGN AUTOMATION FOR ADVANCED MICROFLUIDIC BIOCHIPS Speaker and Author: Debraj Kundu, IITR, IN Abstract The science behind the handling of fluids on the scale of nano to femto liter in order to automate a bio-application is termed as microfluidics and the devices used in such process are generally called as biochips. Due to the recent advancements in the fabrication technologies of these biochips, there is a huge boom in its design automation field in the last decade. Integration, precision, and high throughput are the main advantages of biochips over lab-based macro systems. Based on the working principle, these biochips can be broadly classified as continuous flow-based microfluidic biochips (CFMBs) and digital microfluidic biochips (DMFBs). In order to automate various bio-applications on a biochip different design automation methodologies are required for different kinds of biochips. We provide rigorous and elegant design automation techniques for sample preparation, fluid loading, placement of mixers and scheduling of mixing graphs in MEDA, PMD and CMF biochips. |
18:30 CET | 10.1.12 | ULTRA-FAST TEMPERATURE ESTIMATION METHODS FOR ARCHITECTURE-LEVEL THERMAL MODELING Speaker and Author: Hameedah Sultan, Indian Institute of Technology Delhi, IN Abstract As the power density of modern-day chips has increased, the chip temperature, too, has increased steadily. High temperature causes several adverse effects, affecting the chip's performance and reliability. It also increases the leakage power, which further increases the on-chip temperature, resulting in a feedback effect. In order to carry out temperature-aware design optimization, it is often necessary to conduct thousands of temperature simulations at various stages of the design cycle, and thus the speed of simulation without a concomitant loss in accuracy is essential. State-of-the-art works in thermal estimation have serious limitations in modeling some important aspects of thermal modeling. Additionally, these methods are slow. We overcome the limitations of these works by developing fast Green's function-based analytical methods. |
18:30 CET | 10.1.13 | MULTI-OBJECTIVE DIGITAL VLSI DESIGN OPTIMISATION Speaker and Author: Linan Cao, University of York, GB Abstract Modern VLSI design's complexity and density has been exponentially increasing over the past 50 years and recently reached a stage within its development, allowing heterogeneous, many-core systems and numerous functions to be integrated into a tiny silicon die. These achievements are accomplished by pushing process technology to its physical limits. Transistor shrinking has succeeded with continuous improvements in the physical dimension, switching frequency and power efficiency of integrated circuits (ICs), allowing embedded electronic systems to be used in more and more real-world automated applications. However, as advanced semiconductor technologies come ever closer to the atomic scale, the transistor scaling challenge and stochastic performance variations intrinsic to fabrication emerge. Electronic design automation (EDA) tools handle the growing size and complexity of modern electronic designs by breaking down systems into smaller blocks or cells, introducing different levels of abstraction. In the field of digital very large scale integration (VLSI) design, comprehensive and mature industry-standard design flows are available to tape out chips. This complex process consists of several steps including logic design, logic synthesis, physical implementation and pre-silicon physical verification. However, in this staged, hierarchical design approach, where each step is optimised independently, overheads and inefficiency can accumulate in the resulting overall design. Designers and EDA vendors have to handle these challenges from process technology, design complexity and growing scale, which may otherwise result in inferior design quality, even failures, and lower design yields under time-to-market pressure. Multiple or many design objectives and constraints are emerging during the design process and often need to be dealt with simultaneously. Multi-objective evolutionary algorithms (MOEAs) show flexible capabilities in maintaining multiple variable components and factors in uncertain environments. The VLSI design process involves a large number of available parameters both from designs and EDA tools. This provides many potential optimisation avenues where evolutionary algorithms can excel. This PhD work investigates the application of evolutionary techniques for digital VLSI design optimisation. Automated multi-objective optimisation frameworks, compatible with industrial design flows and foundry technologies, are proposed to improve solution performance, expand feasible design space, and handle complex physical floorplan constraints through tuning designs at gate-level. Methodologies for enriching standard cell libraries regarding drive strength are also introduced to cooperate with multi-objective optimisation frameworks, e.g., subsequent hill-climbing, providing a richer pool of solutions optimised for different trade-offs. The experiments of this thesis work demonstrate that multi-objective evolutionary algorithms, derived from biological inspirations, can assist the digital VLSI design process, in an industrial design context, to more efficiently search for well-balanced trade-off solutions as well as optimised design space coverage. The expanded drive granularity of standard cells can push the performance of silicon technologies with offering improved solutions regarding critical objectives. The achieved optimisation results can better deliver trade-off solutions regarding power, performance and area (PPA) metrics than using standard EDA tools alone. This has been not only shown for a single circuit solution but also covered the entire standard-tool-produced design space. |
18:30 CET | 10.1.14 | TINYDL: EFFICIENT DESIGN OF SCALABLE DEEP NEURAL NETWORKS FOR RESOURCE-CONSTRAINED EDGE DEVICES Speaker and Author: Mohammad Loni, Mälardalen University, SE Abstract The main aim of my Ph.D. thesis is to develop theoretical foundations and practical algorithms that (i) enable designing scalable and energy-efficient DNNs with low energy footprint, (ii) facilitate fast deployment of complicated DL models for a diverse set of Edge devices satisfying given hardware constraints, and (iii) improve the accuracy of network quantization methods for largescale datasets. To address research challenges, I developed (i) a set of ADONN, DeepMaker, NeuroPower, DenseDisp and FastStereoNet frameworks during my Ph.D. studies to design hardware-friendly NAS methods with minimum design cost, and (ii) novel ternarization frameworks named TOT-Net and TAS that prevents the accuracy degradation of quantization techniques. |
18:30 CET | 10.1.15 | DECISION DIAGRAMS IN QUANTUM DESIGN AUTOMATION Speaker and Author: Stefan Hillmich, Johannes Kepler University Linz, AT Abstract The impact quantum computing may achieve hinges on Computer-Aided Design (CAD) keeping up with the increasing power of physical realizations. The complexity of quantum computing has to be tackled with dedicated methods and data structures as well as a close cooperation between the CAD community and physicists. The main contribution of the thesis is to narrow the emerging design gap for quantum computing by bringing established methods of the CAD community to the quantum world. More precisely, the work focuses on the application of decision diagrams to the areas of quantum circuit simulation, estimation of observables in quantum chemistry, and technology mapping. The supporting paper is attached to the extended abstract. |
18:30 CET | 10.1.16 | DEPENDABLE RECONFIGURABLE SCAN NETWORKS Speaker: Natalia Lylina, University of Stuttgart, DE Authors: Natalia Lylina and Hans-Joachim Wunderlich, University of Stuttgart, DE Abstract Dependability of modern devices is enhanced by integrating an extensive number of non-functional instruments. These are needed to facilitate cost-efficient bring-up, debug, test, diagnosis, and adaptivity in the field and might include, e.g., sensors, aging monitors, Logic and Memory Built-In Self-Test (BIST) registers. Reconfigurable Scan Networks (RSNs) provide a flexible way to access such instruments as well the device's registers throughout the lifetime, starting from PSV through manufacturing test and finally during in-field test. At the same time, the dependability properties of the device-under-test (DUT) can be affected through an improper RSN integration. This doctoral project overcomes these problems and establishes a methodology to integrate dependable RSNs for a given device considering such dependability aspects, as accessibility via RSNs, testability of RSNs, and security compliance of RSNs with the underlying device-under-test. The remainder of this extended abstract is structured as follows. First, the background information about RSNs is provided, followed by the challenges of dependability-aware RSN integration. Next, the objectives and the contributions of this work are summarized for specific dependability properties. |
18:30 CET | 10.1.17 | BREAKING THE ENERGY CAGE OF INSECT-SCALE AUTONOMOUS DRONES: INTERPLAY OF PROBABILISTIC HARDWARE AND CO-DESIGNED ALGORITHMS Speaker: Priyesh Shukla, University of Illinois at Chicago, US Authors: Priyesh Shukla and Amit Trivedi, University of Illinois at Chicago, US Abstract Autonomy at insect-scale drones is challenged with highly constrained area and power budget. Robustness amidst noisy sensory inputs and surrounding is also critical. To address this, we present two compute-in-memory (CIM) frameworks for insect-scale drone localization. Our first framework is floating-gate (FG) inverter array based CIM (for Bayesian particle filtering) that efficiently evaluates log-likelihood of drone's pose which otherwise demands heavy computational workload using conventional digital processing. Our second method is Monte-Carlo dropout (MC-Dropout)-based deep neural network (DNN) inference in an all-digital 8T-SRAM (static random access memory) CIM. The CIM is equipped with additional MC-Dropout inference primitives to account for uncertainty in drone's pose prediction. We discuss compute reuse and optimization strategy for MC-Dropout schedules to gain significant reduction in this (approximated Bayesian) DNN workload. FG-CIM based localization is 25x energy efficient that conventional digital processing. And SRAM-CIM for MC-Dropout inference consumes 28pJ for 30 MC-Dropout inference iterations (3 TOPS/W). |
18:30 CET | 10.1.18 | RESILIENT: PROTECTING DESIGN IP FROM MALICIOUS ENTITIES Speaker: Nimisha Limaye, New Yor University, US Authors: Nimisha Limaye1 and Ozgur Sinanoglu2 1New York University, US; 2New York University Abu Dhabi, AE Abstract Globalization of integrated circuit (IC) supply chain opened up venues for untrusted entities with the malicious intent of intellectual property (IP) piracy and overproduction of ICs. These malicious entities encompass foundry, test facility, and end user. An untrusted foundry can readily obtain the unprotected design IP from the design house, and a test facility or an end user can reverse-engineer the chip using widely available tools and extract the underlying design IP to pirate or overproduce the ICs. We first perform an exhaustive security analysis of the state-of-the-art logic locking techniques and propose various attacks. Further, we propose countermeasures to thwart attacks from all the malicious entities in the supply chain. Through our solutions, we allow the security-enforcing designers to protect their design IP at various abstraction levels. Our solution can protect not just digital designs but also mixed-signal designs. |
18:30 CET | 10.1.19 | ALGORITHM-ARCHITECTURE CO-DESIGN FOR ENERGY-EFFICIENT, ROBUST, AND PRIVACY-PRESERVING MACHINE LEARNING Speaker and Author: Souvik Kundu, USC, US Abstract My Ph.D. research includes three major aspects of the algorithm -architecture co-design for machine learning accelerators: 1. energy-efficiency via novel training-efficient pruning, quantization, and distillation, 2. robust model training for safety-critical edge applications, 3. analysis of model and data privacy of the associated IPs. |
18:30 CET | 10.1.20 | PERFORMANCE-AWARE DESIGN-SPACE OPTIMIZATION AND ATTACK MITIGATION FOR EMERGING HETEROGENEOUS ARCHITECTURES Speaker and Author: Mitali Sinha, IIIT Delhi, IN Abstract The growing system sizes and time-to-market pressure of heterogeneous SoCs compel the chip designers to analyze only part of the design space, leading to suboptimal Intellectual Property (IP) designs. Hence, different processing cores like accelerators are generally designed as standalone IP blocks by third-party vendors and chip designers often over-provision the amount of on-chip resources required to add flexibility to each IP design. Although this modularity simplifies IP design, integrating these off-the-shelf IP blocks into a single SoC may overshoot the resource budget of the underlying system. Furthermore, the integration of third-party IPs alongside other on-chip modules makes the system vulnerable to security threats. This work addresses the challenges involved in designing efficient heterogeneous SoCs by optimizing the utilization of on-chip resources and mitigating performance-based security threats. |
18:30 CET | 10.1.21 | PRACTICAL SIDE-CHANNEL AND FAULT ATTACKS ON LATTICE-BASED CRYPTOGRAPHY Speaker: Prasanna Ravi, Nanyang Technological University, SG Authors: Prasanna Ravi1, Anupam Chattopadhyay1 and Shivam Bhasin2 1Nanyang Technological University, SG; 2Temasek Laboratories, Nanyang Technological University, SG Abstract The possibility of large scale quantum computers in the future has been an ever-growing threat towards existing public-key infrastructure, which is predomiantly based on classical RSA and ECC-based public-key cryptography. This prompted NIST to initiate a global level standardization process for alternate quantum-attack resistant Public Key Encryption (PKE), Key Encapsulation Mechanisms (KEM) and Digital Signatures (DSS), better known as Post-Quantum Cryptography (PQC). The PQC standardization process started in 2017 with 69 submissions and is currently in its third and final round with seven (7) main finalist candidates and eight (8) alternate finalist candidates. Among these fifteen (15) finalist candidates, seven (7) of them belong to a single category, referred to as lattice-based cryptography. These schemes are based on hard geometric problems, that are conjectured to be computationally intractable by quantum computers. NIST laid out several evaluation criteria for the standardization process, which include theoretical Post-Quantum (PQ) security guarantees, implementation cost and performance. Along with them, resistance against physical attacks such as Side-Channel Analysis (SCA) and Fault Injection Analysis (FIA) has also emerged as an important criterion for the standardization process. This is especially relevant for adoption of PQC in embedded devices, which will be used in environments where an attacker can have unimpeded physical access to the target device. We therefore focus on evaluating the security of practical implementations of lattice-based schemes against SCA and FIA. We have identified novel SCA and FIA vulnerabilities which led to practical attacks on implementations of several lattice-based schemes. Most of our attacks exploit vulnerabilities inherent in the algorithms of lattice-based schemes, which make our attacks adaptable to different implementation platforms (hardware and software). |
18:30 CET | 10.1.22 | MEMORY INTERFERENCE AND MITIGATIONS IN RECONFIGURABLE HESOCS FOR EMBEDDED AI Speaker: Gialuca Brilli, University of Modena and Reggio Emilia, IT Authors: Gianluca Brilli, Alessandro Capotondi, Paolo Burgio, Andrea Marongiu and Marko Bertogna, University of Modena and Reggio Emilia, IT Abstract Recent advances in high-performance embedded systems has paved the way for next-generation applications, which were impratical few decades ago, such as Deep Neural Networks (DNNs). DNNs are widely adopted in several embedded domains and in particular in the so-called Cyber Physical Systems (CPS). Examples of CPS are autonomous robots, that typically integrate one or more neural networks into their navigation systems for perception and localization tasks. To match this need, high-performance embedded chips manufacturers are increasingly adopting a heterogeneous design (HeSoC), where sequential processors and energy efficient massively parallel accelerators, used to perform ML tasks in an energy efficient manner. These systems typycally follow a Commercial-Off-The-Shelf (COTS) organization, where the memory hierarchy composed of multiple cache layers and a main memory (DRAM) is shared between the computational engines of the system. This scheme allows on the one hand to increase the time-to-market, the scalability of the system and in general to provide good average-case performance. However, it is not always adequate in applications where by construction the system must guarantee bounded performance even in the worst-case. Shared memory organization creates contention problems on shared resources [1]–[3], where the execution time of a task also depends on the number of other tasks that access a given shared resource in the same time interval. The main aspects addressed in this work are: (i) a characterization of state-of-the-art embedded neural networks engines, to study the typical workload of a DNN and the impact that could have on the system; (ii) A deep memory interference characterization on HeSoCs with particular reference to FPGA-based; (iii) Architectural solutions to mitigate memory interference and improve the low memory-bandwidth utilization of PREM-like schemes. |
IP.1_1 Interactive presentations
Add this session to my calendar
Date: Thursday, 17 March 2022
Time: 11:30 CET - 12:15 CET
Interactive Presentations (IPs) run simultaneously during a 45-minute slot. Authors of IPs are available to present their work and answer to questions throughout the session.
Label | Presentation Title Authors |
---|---|
IP.1_1.1 | (Best Paper Award Candidate) A SOFTWARE ARCHITECTURE TO CONTROL SERVICE-ORIENTED MANUFACTURING SYSTEMS Speaker: Sebastiano Gaiardelli, Università di Verona, IT Authors: Sebastiano Gaiardelli1, Stefano Spellini1, Marco Panato2, Michele Lora3 and Franco Fummi1 1Università di Verona, IT; 2Universita' di Verona, IT; 3University of Southern California, US Abstract This paper presents a software architecture extending the classical automation pyramid to control and reconfigure flexible, service-oriented manufacturing systems. At the Planning level, the architecture requires a Manufacturing Execution System (MES) consistent with the International Society of Automation (ISA) standard. Then, the Supervisory level is automated by introducing a novel component, called Automation Manager. The new component interacts upward with the MES, and downward with a set of servers providing access to the manufacturing machines. The communication with machines relies on the OPC Unified Architecture (OPC UA) standard protocol, which allows exposing production tasks as “services”. The proposed software architecture has been prototyped to control a real production line, originally controlled by a commercial MES, unable to fully exploit the flexibility provided by the case study manufacturing system. Meanwhile, the proposed architecture is fully exploiting the production line’s flexibility. |
IP.1_1.2 | (Best Paper Award Candidate) COMPREHENSIVE AND ACCESSIBLE CHANNEL ROUTING FOR MICROFLUIDIC DEVICES Speaker: Philipp Ebner, Johannes Kepler University, AT Authors: Gerold Fink, Philipp Ebner and Robert Wille, Johannes Kepler University Linz, AT Abstract Microfluidics is an emerging field that allows to minimize, integrate, and automate processes that are usually conducted with unwieldy laboratory equipment inside a single device; resulting in so-called "Labs-on-a-Chip" (LoCs). The design process of channel-based LoCs is still mainly conducted manually thus far - resulting in time-consuming tasks and error-prone designs. This also holds for the routing process, where multiple components inside an LoC should be connected according to a specification. In this work, we present a routing tool which considers the particular requirements of microfluidic applications and automates the routing process. In order to make the tool more accessible (even to users with little to no EDA-expertise), it is incorporated into a user-friendly and intuitive online interface. |
IP.1_1.3 | (Best Paper Award Candidate) XST: A CROSSBAR COLUMN-WISE SPARSE TRAINING FOR EFFICIENT CONTINUAL LEARNING Speaker: Fan Zhang, Arizona State University, US Authors: Fan Zhang, Li Yang, Jian Meng, Jae-sun Seo, Yu Cao and Deliang Fan, Arizona State University, US Abstract Leveraging the ReRAM crossbar-based In-Memory-Computing(IMC) to accelerate single task DNN inference has been widely studied. However, using the ReRAM crossbar for continual learning has not been explored yet. In this work, we propose XST, a novel crossbar column-wise sparse training framework for continual learning. XST significantly reduces the training cost and saves inference energy. More importantly, it is friendly to existing crossbar-based convolution engine with almost no hardware overhead. Compared with the state-of-the-art CPG method, the experiments show that XST's accuracy achieves 4.95% higher accuracy. Furthermore, XST demonstrates ~5.59X training speedup and 1.5X inference energy-saving. |
IP.1_2 Interactive presentations
Add this session to my calendar
Date: Thursday, 17 March 2022
Time: 11:30 CET - 12:15 CET
Interactive Presentations (IPs) run simultaneously during a 45-minute slot. Authors of IPs are available to present their work and answer to questions throughout the session.
Label | Presentation Title Authors |
---|---|
IP.1_2.1 | (Best Paper Award Candidate) ENERGY-EFFICIENT BRAIN-INSPIRED HYPERDIMENSIONAL COMPUTING USING VOLTAGE SCALING Speaker: Xun Jiao, Villanova University, US Authors: Sizhe Zhang1, Ruixuan Wang1, Dongning Ma1, Jeff Zhang2, Xunzhao Yin3 and Xun Jiao1 1Villanova University, US; 2Harvard University, US; 3Zhejiang University, CN Abstract Brain-inspired hyperdimensional computing (HDC) is an emerging computational paradigm that mimics the brain cognition and leverages hyperdimensional vectors with fully distributed holographic representation and (pseudo) randomness. Recently, HDC has demonstrated promising capability in a wide range of applications such as medical diagnosis, human activity recognition, and voice classification, etc. Despite the growing popularity of HDC, its memory-centric computing characteristics make the associative memory implementation under significant energy consumption due to the massive data storage and processing. While voltage scaling has been studied intensively to reduce memory energy dissipation, it can introduce errors which would degrade the output quality. In this paper, we systematically study and leverage the application-level error resilience of HDC to reduce the energy consumption of HDC associative memory by using voltage scaling. Evaluation results on various applications show that our proposed approach can achieve 47.6% energy saving on associative memory with a negligible accuracy loss (<1%). We further explore two low-cost error masking methods, i.e., word masking and bit masking, respectively, to mitigate the impact of voltage scaling-induced errors. Experimental results show that the proposed word masking (bit masking) method can further enhance energy saving up to 62.3% (72.5%) with accuracy loss <1%. |
IP.1_2.2 | ERROR GENERATION FOR 3D NAND FLASH MEMORY Speaker: Weihua Liu, Huazhong University of Science and Technology, CN Authors: Weihua Liu, Fei Wu, Songmiao Meng, Xiang Chen and Changsheng Xie, Huazhong University of Science and Technology, CN Abstract Three-dimension (3D) NAND flash memory is the preferred storage component of solid-state drive (SSD) for its high ratio of capacity and cost. Optimizing the reliability of modern SSD needs to test and collect a large amount of realworld error data from 3D NAND flash memory. However, the test costs have surged dozens of times as its capacity increases. It’s imperative to reduce the costs of testing denser and highcapacity flash memory. To facilitate it, in this paper, we aim to enable reproducing error data efficiently for 3D NAND flash memory. We use a conditional generative adversarial network (cGAN) to learn the error distribution with multiple interferences and generate diverse error data comparable to the real-world. Evaluation results demonstrate it is feasible and efficient for error generation with cGAN. |
IP.1_2.3 | ESTIMATING VULNERABILITY OF ALL MODEL PARAMETERS IN DNN WITH A SMALL NUMBER OF FAULT INJECTIONS Speaker: Yangchao Zhang, Osaka University, JP Authors: Yangchao Zhang1, Hiroaki Itsuji2, Takumi Uezono2, Tadanobu Toba2 and Masanori Hashimoto3 1Osaka University, JP; 2Hitachi Ltd., JP; 3Kyoto University, JP Abstract The reliability of deep neural networks (DNNs) against hardware errors is essential as DNNs are increasingly employed in safety-critical applications such as automatic driving. Transient errors in memory, such as radiation-induced soft error, may propagate through the inference computation, resulting in unexpected output, which can adversely trigger catastrophic system failures. As a first step to tackle this problem, this paper proposes constructing a vulnerability model (VM) with a small number of fault injections to identify vulnerable model parameters in DNN. We reduce the number of bit locations for fault injection significantly and develop a flow to incrementally collect the training data, i.e., the fault injection results, for VM accuracy improvement. Experimental results show that VM can estimate vulnerabilities of all DNN model parameters only with 1/3490 computations compared with traditional fault injection-based vulnerability estimation. |
IP.1_3 Interactive presentations
Add this session to my calendar
Date: Thursday, 17 March 2022
Time: 11:30 CET - 12:15 CET
Interactive Presentations (IPs) run simultaneously during a 45-minute slot. Authors of IPs are available to present their work and answer to questions throughout the session.
Label | Presentation Title Authors |
---|---|
IP.1_3.1 | EXPLOITING ARBITRARY PATHS FOR THE SIMULATION OF QUANTUM CIRCUITS WITH DECISION DIAGRAMS Speaker: Lukas Burgholzer, Johannes Kepler University Linz, Austria, AT Authors: Lukas Burgholzer, Alexander Ploier and Robert Wille, Johannes Kepler University Linz, AT Abstract The classical simulation of quantum circuits is essential in the development and testing of quantum algorithms. Methods based on tensor networks or decision diagrams have proven to alleviate the inevitable exponential growth of the underlying complexity in many cases. But the complexity of these methods is very sensitive to so-called contraction plans or simulation paths, respectively, which define the order in which respective operations are applied. While, for tensor networks, a plethora of strategies has been developed, simulation based on decision diagrams is mostly conducted in a straight-forward fashion thus far. In this work, we envision a flow that allows to translate strategies from the domain of tensor networks to decision diagrams. Preliminary results indicate that a substantial advantage may be gained by employing suitable simulation paths—motivating a thorough consideration. |
IP.1_3.2 | A NOVEL NEUROMORPHIC PROCESSORS REALIZATION OF SPIKING DEEP REINFORCEMENT LEARNING FOR PORTFOLIO MANAGEMENT Speaker: Seyyed Amirhossein Saeidi, Amirkabir University of Technology (Tehran Polytechnic), IR Authors: Seyyed Amirhossein Saeidi, Forouzan Fallah, Soroush Barmaki and Hamed Farbeh, Amirkabir University of Technology, IR Abstract The process of constantly reallocating budgets into financial assets, aiming to increase the anticipated return of assets and minimizing the risk, is known as portfolio management. Processing speed and energy consumption of portfolio management have become crucial as the complexity of their real-world applications increasingly involves high-dimensional observation and action spaces and environment uncertainty, which their limited onboard resources cannot offset. Emerging neuromorphic chips inspired by the human brain increase processing speed by up to 500 times and reduce power consumption by several orders of magnitude. This paper proposes a spiking deep reinforcement learning (SDRL) algorithm that can predict financial markets based on unpredictable environments and achieve the defined portfolio management goal of profitability and risk reduction. This algorithm is optimized for Intel’s Loihi neuromorphic processor and provides 186x and 516x energy consumption reduction compared to a high-end processor and GPU, respectively. In addition, a 1.3x and 2.0x speed-up is observed over the high-end processors and GPUs, respectively. The evaluations are performed on cryptocurrency market benchmark between 2016 and 2021. |
IP.1_3.3 | IN-SITU TUNING OF PRINTED NEURAL NETWORKS FOR VARIATION TOLERANCE Speaker: Mehdi Tahoori, Karlsruhe Institute of Technology, DE Authors: Michael Hefenbrock, Dennis Weller, Jasmin Aghassi, Michael Beigl and Mehdi Tahoori, Karlsruhe Institute of Technology, DE Abstract Printed electronic (PE) can meet the requirements of many application domains with requirements on cost, conformity, and non-toxicity which silicon-based computing systems cannot achieve. A typical computational task to be performed in many of such applications is classification. Therefore, printed Neural Networks (pNNs) have been proposed to meet these requirements. However, PE suffers from high process variations due to low resolution printing in low-cost additive manufacturing. This can severely impact the inference accuracy of pNNs. In this work, we show how a unique feature of PE, namely additive printing can be leveraged to perform in-situ tuning of pNNs to compensate accuracy losses induced by device variations. The experiments show that, even under 30 % variation of the conductances, up to 90% of the initial accuracy can be recovered. |
IP.1_4 Interactive presentations
Add this session to my calendar
Date: Thursday, 17 March 2022
Time: 11:30 CET - 12:15 CET
Interactive Presentations (IPs) run simultaneously during a 45-minute slot. Authors of IPs are available to present their work and answer to questions throughout the session.
Label | Presentation Title Authors |
---|---|
IP.1_4.1 | PRACTICAL IDENTITY RECOGNITION USING WIFI'S CHANNEL STATE INFORMATION Speaker: Cristian Turetta, University of Verona, IT Authors: Cristian Turetta1, Florenc Demrozi1, Philipp H. Kindt2, Alejandro Masrur3 and Graziano Pravadelli1 1Università di Verona, IT; 2TU Munich, DE; 3TU Chemnitz, DE Abstract Identity recognition is increasingly used to control access to sensitive data, restricted areas in industrial, healthcare, and defense settings, as well as in consumer electronics. To this end, existing approaches are typically based on collecting and analyzing biometric data and imply severe privacy concerns. Particularly when cameras are involved, users might even reject or dismiss an identity recognition system. Furthermore, iris or fingerprint scanners, cameras, microphones, etc., imply installation and maintenance costs and require the user's active participation in the recognition procedure.This paper proposes a non-intrusive identity recognition system based on analyzing WiFi's Channel State Information (CSI). We show that CSI data attenuated by a person's body and typical movements allows for a reliable identification -- even in a sitting posture. We further propose a lightweight deep learning algorithm trained using CSI data, which we implemented and evaluated on an embedded platform (i.e., a Raspberry Pi 4B). Our results obtained using real-world experiments suggest a high accuracy in recognizing people's identity, with a specificity of 98% and a sensitivity of 99%, while requiring a low training effort and negligible cost. |
IP.1_4.2 | A RDMA INTERFACE FOR ULTRA-FAST ULTRASOUND DATA-STREAMING OVER AN OPTICAL LINK Speaker: Andrea Cossettini, ETH Zurich, CH Authors: Andrea Cossettini, Konstantin Taranov, Christian Vogt, Michele Magno, Torsten Hoefler and Luca Benini, ETH Zürich, CH Abstract Digital ultrasound (US) probes integrate the analog-to-digital conversion directly on the probe and can be conveniently connected to commodity devices. Existing digital probes are however limited to a relatively small number of channels, do not guarantee access to the raw US data, or cannot operate at very high frame rates (e.g., due to exhaustion of computing and storage units on the receiving device). In this work, we present an open, compact, power-efficient, 192-channels digital US data acquisition system capable of streaming US data at transfer rates greater than 80 Gbps towards a host PC for ultra-high frame rate imaging (in the multi-kHz range). Our US probe is equipped with two power-efficient Field Programmable Gate Arrays (FPGAs) and is interfaced to the host PC with two optical-link 100G Ethernet connections. The high-speed performance is enabled by implementing a Remote Direct Memory Access (RDMA) communication protocol between the probe and the controlling PC, that utilizes a high-performance Non-Volatile Memory Express (NVMe) interface to store the streamed data. To the best of our knowledge, thanks to the achieved datarates, this is the first high-channel-count compact digital US platform capable of raw data streaming at frame rates of 20 kHz (for imaging at 3.5 cm depths), without the need for sparse sampling, consuming less than 40 W. |
IP.1_4.3 | ROBUST HUMAN ACTIVITY RECOGNITION USING GENERATIVE ADVERSARIAL IMPUTATION NETWORKS Speaker: Dina Hussein, Washington State University, US Authors: Dina Hussein1, Aaryan Jain2 and Ganapati Bhat1 1Washington State University, US; 2Nikola Tesla STEM High School, US Abstract Human activity recognition (HAR) is widely used in applications ranging from activity tracking to rehabilitation of patients. HAR classifiers are typically trained with data collected from a known set of users while assuming that all the sensors needed for activity recognition are working perfectly and there are no missing samples. However, real-world usage of the HAR classifier may encounter missing data samples due to user error, device error, or battery limitations. The missing samples, in turn, lead to a significant reduction in accuracy. To address this limitation, we propose an adaptive method that either uses low-power mean imputation or generative adversarial imputation networks (GAIN) to recover the missing data samples before classifying the activities. Experiments on a public HAR dataset with 22 users show that the proposed robust HAR classifier achieves 94% classification accuracy with as much as 20% missing samples from the sensors with 390 µJ energy consumption per imputation. |
IP.1_5 Interactive presentations
Add this session to my calendar
Date: Thursday, 17 March 2022
Time: 11:30 CET - 12:15 CET
Interactive Presentations (IPs) run simultaneously during a 45-minute slot. Authors of IPs are available to present their work and answer to questions throughout the session.
Label | Presentation Title Authors |
---|---|
IP.1_5.1 | HYPERX: A HYBRID RRAM-SRAM PARTITIONED SYSTEM FOR ERROR RECOVERY IN MEMRISTIVE XBARS Speaker: Adarsh Kosta, Purdue University, US Authors: Adarsh Kosta, Efstathia Soufleri, Indranil Chakraborty, Amogh Agrawal, Aayush Ankit and Kaushik Roy, Purdue University, US Abstract Memristive crossbars based on Non-volatile Memory (NVM) technologies such as RRAM, have recently shown great promise for accelerating Deep Neural Networks (DNNs). They achieve this by performing efficient Matrix-Vector-Multiplications (MVMs) while offering dense on-chip storage and minimal off-chip data movement. However, their analog nature of computing introduces functional errors due to non-ideal RRAM devices, significantly degrading the application accuracy. Further, RRAMs suffer from low endurance and high write costs, hindering on-chip trainability. To alleviate these limitations, we propose HyperX, a hybrid RRAM-SRAM system that leverages the complementary benefits of NVM and CMOS technologies. Our proposed system consists of a fixed RRAM block offering area and energy-efficient MVMs and an SRAM block enabling on-chip training to recover the accuracy drop due to the RRAM non-idealities. The improvements are reported in terms of energy and product of latency and area (ms x mm^2), termed as area-normalized latency. Our experiments on CIFAR datasets using ResNet-20 show up to 2.88x and 10.1x improvements in inference energy and area-normalized latency, respectively. In addition, for a transfer learning task from ImageNet to CIFAR datasets using ResNet-18, we observe up to 1.58x and 4.48x improvements in energy and area-normalized latency, respectively. These improvements are with respect to an all-SRAM baseline. |
IP.1_5.2 | A RESOURCE-EFFICIENT SPIKING NEURAL NETWORK ACCELERATOR SUPPORTING EMERGING NEURAL ENCODING Speaker: Daniel Gerlinghoff, Agency for Science, Technology and Research, SG Authors: Daniel Gerlinghoff1, Zhehui Wang1, Xiaozhe Gu2, Rick Siow Mong Goh1 and Tao Luo1 1Agency for Science, Technology and Research, SG; 2Chinese University of Hong Kong, Shenzhen, CN Abstract Spiking neural networks (SNNs) recently gained momentum due to their low-power multiplication-free computing and the closer resemblance of biological processes in the nervous system of humans. However, SNNs require very long spike trains (up to 1000) to reach an accuracy similar to their artificial neural network (ANN) counterparts for large models, which offsets efficiency and inhibits its application to low-power systems for real-world use cases. To alleviate this problem, emerging neural encoding schemes are proposed to shorten the spike train while maintaining the high accuracy. However, current accelerators for SNN cannot well support the emerging encoding schemes. In this work, we present a novel hardware architecture that can efficiently support SNN with emerging neural encoding. Our implementation features energy and area efficient processing units with increased parallelism and reduced memory accesses. We verified the accelerator on FPGA and achieve 25% and 90% improvement over previous work in power consumption and latency, respectively. At the same time, high area efficiency allows us to scale for large neural network models. To the best of our knowledge, this is the first work to deploy the large neural network model VGG on physical FPGA-based neuromorphic hardware. |
IP.1_5.3 | SCALABLE HARDWARE ACCELERATION OF NON-MAXIMUM SUPPRESSION Speaker: Chunyun Chen, Nanyang Technological University, SG Authors: Chunyun Chen1, Tianyi Zhang2, Zehui Yu1, Adithi Raghuraman1, Shwetalaxmi Udayan1, Jie Lin2 and Mohamed Aly1 1Nanyang Technological University, SG; 2Institute for Infocomm Research, ASTAR, SG Abstract Non-maximum Suppression (NMS) in one- and two-stage object detection deep neural networks (e.g., SSD and Faster- RCNN) is becoming the computation bottleneck. In this paper, we introduce a hardware acceleration for the scalable PSRR- MaxpoolNMS algorithm. Our architecture shows 75.0× and 305× speedups compared to the software implementation of the PSRR- MaxpoolNMS as well as the hardware implementations of Gree-dyNMS, respectively, while simultaneously achieving comparable Mean Average Precision (mAP) to software-based floating-point implementations. Our architecture is 13.4× faster than the state-of-the-art NMS one. Our accelerator supports both one- and two-stage detectors, while supporting very high input resolutions (i.e., FHD)—essential input size for better detection accuracy. |
IP.1_6 Interactive presentations
Add this session to my calendar
Date: Thursday, 17 March 2022
Time: 11:30 CET - 12:15 CET
Interactive Presentations (IPs) run simultaneously during a 45-minute slot. Authors of IPs are available to present their work and answer to questions throughout the session.
Label | Presentation Title Authors |
---|---|
IP.1_6.1 | ACTIVE LEARNING OF ABSTRACT SYSTEM MODELS FROM TRACES USING MODEL CHECKING Speaker: Natasha Yogananda Jeppu, University of Oxford, GB Authors: Natasha Yogananda Jeppu1, Tom Melham1 and Daniel Kroening2 1University of Oxford, GB; 2Amazon, Inc, GB Abstract We present a new active model-learning approach to generating abstractions of a system implementation, as finite state automata (FSAs), from execution traces. Given an implementation and a set of observable system variables, the generated automata admit all system behaviours over the given variables and provide useful insight in the form of invariants that hold on the implementation. To achieve this, the proposed approach uses a pluggable model learning component that can generate an FSA from a given set of traces. Conditions that encode a completeness hypothesis are then extracted from the FSA under construction and used to evaluate its degree of completeness by checking their truth value against the system using software model checking. This generates new traces that express any missing behaviours. The new trace data is used to iteratively refine the abstraction, until all system behaviours are admitted by the learned abstraction. To evaluate the approach, we reverse-engineer a set of publicly available Simulink Stateflow models from their C implementations. |
IP.1_6.2 | REDUCING THE CONFIGURATION OVERHEAD OF THE DISTRIBUTED TWO-LEVEL CONTROL SYSTEM Speaker: Yu Yang, KTH Royal Institute of Technology, SE Authors: Yu Yang, Dimitrios Stathis and Ahmed Hemani, KTH Royal Institute of Technology, SE Abstract With the growing demand for more efficient hardware accelerators for streaming applications, a novel Coarse-Grained Reconfigurable Architecture (CGRA) that uses a Distributed Two-Level Control (D2LC) system has been proposed in the literature. Even though the highly distributed and parallel structure makes it fast and energy-efficient, the single-issue instruction channel between the level-1 and level-2 controller in each D2LC cell becomes the bottleneck of its performance. In this paper, we improve its design to mimic a multi-issued architecture by inserting shadow instruction buffers between the level-1 and level-2 controllers. Together with a zero-overhead hardware loop, the improved D2LC architecture can enable efficient overlap between loop iterations. We also propose a complete constraint programming based instruction scheduling algorithm to support the above hardware features. The experiment result shows that the improved D2LC architecture can achieve up to 25% of reduction on the instruction execution cycles and 35% reduction on the energy-delay product. |
IP.1_6.3 | BATCHLENS: A VISUALIZATION APPROACH FOR ANALYZING BATCH JOBS IN CLOUD SYSTEMS Speaker: Qiang Guan, Kent State University, US Authors: Shaolun Ruan1, Yong Wang1, Hailong Jiang2, Weijia Xu3 and Qiang Guan2 1Singapore Management University, SG; 2Kent State University, US; 3TACC, US Abstract Cloud systems are becoming increasingly powerful and complex. It is highly challenging to identify anomalous execution behaviors and pinpoint problems by examining the overwhelming intermediate results/states in complex application workflows. Domain scientists urgently need a friendly and functional interface to understand the quality of the computing services and the performance of their applications in real time. To meet these needs, we explore data generated by job schedulers and investigate general performance metrics (e.g., utilization of CPU, memory and disk I/O). Specifically, we propose an interactive visual analytics approach, BatchLens, to provide both providers and users of cloud service with an intuitive and effective way to explore the status of system batch jobs and help them conduct root-cause analysis of anomalous behaviors in batch jobs. We demonstrate the effectiveness of BatchLens through a case study on the public Alibaba bench workload trace datasets. |
IP.1_7 Interactive presentations
Add this session to my calendar
Date: Thursday, 17 March 2022
Time: 11:30 CET - 12:15 CET
Interactive Presentations (IPs) run simultaneously during a 45-minute slot. Authors of IPs are available to present their work and answer to questions throughout the session.
Label | Presentation Title Authors |
---|---|
IP.1_7.1 | FLOWACC: REAL-TIME HIGH-ACCURACY DNN-BASED OPTICAL FLOW ACCELERATOR IN FPGA Speaker: Yehua Ling, Sun Yat-sen University, CN Authors: Yehua Ling, Yuanxing Yan, Kai Huang and Gang Chen, Sun Yat-sen University, CN Abstract Recently, accelerator architectures have been designed to use deep neural networks (DNNs) to accelerate computer vision tasks, possessing the advantages of both accuracy and speed. Optical flow accelerator is however not among these architectures that DNNs have been successfully deployed. Existing hardware accelerators for optical flow estimation are all designed for classic methods and generally perform poorly in estimated accuracy. In this paper, we present FlowAcc, a dedicated hardware accelerator for DNN-based optical flow estimation, adopting a pipelined hardware design for real-time processing of image streams. We design an efficient multiplexing binary neural network (BNN) architecture for pyramidal feature extraction to significantly reduce the hardware cost and make it independent of the pyramid level number. Furthermore, efficient hamming distance calculation and competent flow regularization are utilized for hierarchical optical flow estimation to greatly improve the system efficiency. Comprehensive experimental results demonstrate that FlowAcc achieves state-of-the-art estimation accuracy and real-time performance on the Middlebury dataset when compared with the existing optical flow accelerators. |
IP.1_7.2 | ON EXPLOITING PATTERNS FOR ROBUST FPGA-BASED MULTI-ACCELERATOR EDGE COMPUTING SYSTEMS Speaker: Seyyed Ahmad Razavi, University of California, Irvine, US Authors: Seyyed Ahmad razavi, Hsin-Yu Ting, Tootiya Giyahchi and Eli Bozorgzadeh, University of California, Irvine, US Abstract Edge computing plays a key role in providing services for emerging compute-intensive applications while bringing computation close to end devices. FPGAs have been deployed to provide custom acceleration services due to their reconfigurability and support for multi-tenancy in sharing the computing resource. This paper explores an FPGA-based Multi-Accelerator Edge Computing System, that serves various DNN applications from multiple end devices simultaneously. To dynamically maximize the responsiveness to end devices, we propose a system framework that exploits the characteristic of applications in patterns and employs a staggering module coupled with a mixed offline/online multi-queue scheduling method to alleviate resource contention, and uncertain delay caused by network delay variation. Our evaluation shows the framework can significantly improve responsiveness and robustness in serving multiple end devices. |
IP.1_7.3 | RLPLACE: DEEP RL GUIDED HEURISTICS FOR DETAILED PLACEMENT OPTIMIZATION Speaker: Uday Mallappa, UC San Diego, US Authors: Uday Mallappa1, Sreedhar Pratty2 and David Brown2 1University of California San Diego, US; 2Nvidia, US Abstract The solution space of detailed placement becomes intractable with increase in thenumber of placeable cells and their possible locations. So, the existing works either focus on the sliding window-based optimization or row-based optimization. Though these region-based methods enable us to use linear-programming, pseudo-greedy or dynamic-programming algorithms, locally optimal solutions from these methods are globally sub-optimal with inherent heuristics. The heuristics such as the order in which we choose these local problems or size of each sliding window (runtime vs. optimality tradeoff) account for the degradation of solution quality. Our hypothesis is that learning-based techniques (with their richer representation ability) have shown a great success in problems with huge solution spaces, and can offer an alternative to the existing rudimentary heuristics. We propose a two-stage detailed-placement algorithm RLPlace that uses reinforcement learning (RL) for coarse re-arrangement and Satisfiability Modulo Theories (SMT) for fine-grain refinement. With global placement output of two critical IPs as the start point, RLPlace achieves upto 1.35% HPWL improvement as compared to the commercial tool’s detailed-placement result. In addition, RLPlace shows at least 1.2% HPWL improvement over highly optimized detailed-placement variants of the two IPs. |
IP.ASD Interactive presentations
Add this session to my calendar
Date: Thursday, 17 March 2022
Time: 11:30 CET - 12:15 CET
Session chair:
Philipp Mundhenk, Bosh, DE
Interactive Presentations (IPs) run simultaneously during a 45-minute slot. Authors of IPs are available to present their work and answer to questions throughout the session.
Label | Presentation Title Authors |
---|---|
IP.ASD.1 | DEADLOCK ANALYSIS AND PREVENTION FOR INTERSECTION MANAGEMENT BASED ON COLORED TIMED PETRI NETS Speaker: Tsung-Lin Tsou, National Taiwan University, TW Authors: Tsung-Lin Tsou, Chung-Wei Lin and Iris Hui-Ru Jiang, National Taiwan University, TW Abstract We propose a Colored Timed Petri Net (CTPN) based model for intersection management. With the expressiveness of the CTPN-based model, we can consider timing, vehicle-specific information, and different types of vehicles. We then design deadlock-free policies and guarantee deadlock-freeness for intersection management. To the best of our knowledge, this is the first work on CTPN-based deadlock analysis and prevention for intersection management. |
IP.ASD.2 | ATTACK DATA GENERATION FRAMEWORK FOR AUTONOMOUS VEHICLE SENSORS Speaker: Jan Lauinger, TU Munich, DE Authors: Jan Lauinger1, Andreas Finkenzeller1, Henrik Lautebach2, Mohammad Hamad1 and Sebastian Steinhorst1 1TU Munich, DE; 2ZF Group, DE Abstract Driving scenarios of autonomous vehicles combine many data sources with new networking requirements in highly dynamic system setups. To keep security mechanisms applicable to new application fields in the automotive domain, our work introduces a security framework to generate, attack, and validate realistic data sets at rest and in transit. Concerning realistic data sets, our framework leverages autonomous driving simulators as well as static data sets of vehicle sensors. A configurable networking setup enables flexible data encapsulation to perform and validate networking attacks on data in transit. We validate our results with intrusion detection algorithms and simulation environments. Generated data sets and configurations are reproducible, portable, storable, and support iterative security testing of scenarios. |
IP.ASD.3 | CONTRACT-BASED QUALITY-OF-SERVICE ASSURANCE IN DYNAMIC DISTRIBUTED SYSTEMS Speaker: Lea Schönberger, TU Dortmund University, DE Authors: Lea Schönberger1, Susanne Graf2, Selma Saidi3, Dirk Ziegenbein4 and Arne Hamann4 1TU Dortmund University, DE; 2University Grenoble Alpes, CNRS, FR; 3TU Dortmund, DE; 4Robert Bosch GmbH, DE Abstract To offer an infrastructure for autonomous systems offloading parts of their functionality, dynamic distributed systems must be able to satisfy non-functional quality-of-service (QoS) requirements. However, providing hard QoS guarantees without complex global verification that are satisfied even under uncertain conditions is very challenging. In this work, we propose a contract-based QoS assurance for centralized, hierarchical systems, which requires local verification only and has the potential to cope with dynamic changes and uncertainties. |
K.5 Lunch Keynote: "Probabilistic and Deep Learning Techniques for Robot Navigation and Automated Driving"
Add this session to my calendar
Date: Thursday, 17 March 2022
Time: 13:00 CET - 13:50 CET
Session chair:
Rolf Ernst, TU Braunschweig, DE
Session co-chair:
Selma Saidi, TU Dortmund, DE
For autonomous robots and automated driving, the capability to robustly perceive environments and execute their actions is the ultimate goal. The key challenge is that no sensors and actuators are perfect, which means that robots and cars need the ability to properly deal with the resulting uncertainty. In this presentation, I will introduce the probabilistic approach to robotics, which provides a rigorous statistical methodology to deal with state estimation problems. I will furthermore discuss how this approach can be extended using state-of-the-art technology from machine learning to deal with complex and changing real-world environments.
Speaker's bio: Wolfram Burgard is a Professor for Robotics and Artificial Intelligence at the Technical University of Nuremberg. His interests lie in Robotics, Artificial Intelligence, Machine Learning, and Computer Vision. He has published over 400 publications, more than 15 of which received best paper awards. In 2009, he was awarded the Gottfried Wilhelm Leibniz Prize, the most prestigious German research award. In 2010, he received an Advanced Grant from the European Research Council. In 2021, he received the IEEE Technical Field Award for Robotics and Automation. He is a Fellow of the IEEE, the AAAI, the EurAI, and a member of the German Academy of Sciences Leopoldina as well as of the Heidelberg Academy of Sciences and Humanities.
Time | Label | Presentation Title Authors |
---|---|---|
13:00 CET | K.5.1 | PROBABILISTIC AND DEEP LEARNING TECHNIQUES FOR ROBOT NAVIGATION AND AUTOMATED DRIVING Speaker and Author: Wolfram Burgard, TU Nuremberg, DE Abstract For autonomous robots and automated driving, the capability to robustly perceive environments and execute their actions is the ultimate goal. The key challenge is that no sensors and actuators are perfect, which means that robots and cars need the ability to properly deal with the resulting uncertainty. In this presentation, I will introduce the probabilistic approach to robotics, which provides a rigorous statistical methodology to deal with state estimation problems. I will furthermore discuss how this approach can be extended using state-of-the-art technology from machine learning to deal with complex and changing real-world environments. |
11.1 Analog / mixed-signal EDA from system level to layout level
Add this session to my calendar
Date: Thursday, 17 March 2022
Time: 14:30 CET - 15:30 CET
Session chair:
Manuel Barragan, Universite Grenoble Alpes, CNRS, Grenoble INP, TIMA, FR
Session co-chair:
Lars Hedrich, Goethe University of Frankfurt/Main, DE
The first paper in the session explores the high level design of a mixed signal system. Topology generation and sizing for an OPAMP is discussed next. The following 4 papers deal with various issues in placement; placement guided by circuit simulations, discussion of models for placement, routability issues, and finally placement and routing of capacitor arrays.
Time | Label | Presentation Title Authors |
---|---|---|
14:30 CET | 11.1.1 | EFFICSENSE: AN ARCHITECTURAL PATHFINDING FRAMEWORK FOR ENERGY-CONSTRAINED SENSOR APPLICATIONS Speaker: Jonah Van Assche, KU Leuven, BE Authors: Jonah Van Assche, Ruben Helsen and Georges Gielen, KU Leuven, BE Abstract This paper introduces EffiCSense, an architectural pathfinding framework for mixed-signal sensor front-ends for both regular and compressive sensing systems. Since sensing systems are often energy constrained, finding a suitable architecture can be a long iterative process between high-level modeling and circuit design. We present a Simulink-based framework that allows for architectural pathfinding with high-level functional models while also including power consumption models of the different circuit blocks. This allows to directly model the impact of design specifications on power consumption and speeds up the overall design process significantly. Both architectures with and without compressive sensing can be handled. The framework is demonstrated for the processing of EEG signals for epilepsy detection, comparing solutions with and without analog compressive sensing. Simulations show that using the compression, an optimal design can be found that is estimated to be 3.6 times more power-efficient compared to a system without compression, consuming 2.44 uW for a detection accuracy of 99.3%. |
14:34 CET | 11.1.2 | TOPOLOGY OPTIMIZAITON OF OPERATIONAL AMPLIFIER IN CONTINUOUS SPACE VIA GRAPH EMBEDDING Speaker: Jialin Lu, Fudan University, CN Authors: Jialin Lu, Liangbo Lei, Fan Yang, Li Shang and Xuan Zeng, Fudan University, CN Abstract Operational amplifier is a key building block in analog circuits. However, the design process of the operational amplifier is complex and time-consuming, as there are no practical automation tools available in the industry. This paper presents a new topology optimization method for operational amplifiers. The behavioral description of the operational amplifier is described using a directed acyclic graph (DAG), which is then transformed into a low-dimensional embedding in continuous space using a variational graph autoencoder. Topology search is performed in the continuous embedding space using stochastic optimization methods, such as Bayesian Optimization. The yield search results are then transformed back to operational amplifier topologies using a graph decoder. The proposed method is also equipped with a surrogate model for performance prediction. Experimental results show that the proposed approach can achieve significant speedup over the genetic searching algorithms. The produced three-stage operational amplifiers offer competitive performance compared to manual designs. |
14:38 CET | 11.1.3 | A CHARGE FLOW FORMULATION FOR GUIDING ANALOG/MIXED-SIGNAL PLACEMENT Speaker: Tonmoy Dhar, University of Minnesota Twin Cities, US Authors: Tonmoy Dhar1, Ramprasath S2, Jitesh Poojary2, Soner Yaldiz3, Steven Burns3, Ramesh Harjani2 and Sachin S. Sapatnekar2 1University of Minnesota Twin Cities, US; 2University of Minnesota, US; 3Intel Corporation, US Abstract An analog/mixed-signal designer typically performs circuit optimization, involving intensive SPICE simulations, on a schematic netlist and then sends the optimized netlist to layout. During the layout phase, it is vital to maintain symmetry requirements to avoid performance degradation due to mismatch: these constraints are usually specified using user input or by invoking an external tool. Moreover, to achieve high performance, the layout must avoid large interconnect parasitics on critical nets. Prior works that optimize parasitics during placement work with coarse metrics such as the half-perimeter wire length, but these metrics do not appropriately emphasize performance-critical nets. The novel charge flow (CF) formulation in this work addresses both symmetry detection and parasitic optimization. By leveraging schematic-level simulations, which are available “for free” from the circuit optimization step, the approach (a) alters the objective function to emphasize the reduction of parasitics on performance-critical nets, and (b) identifies symmetric elements/element groups. The effectiveness of the CF-based approach is demonstrated on a variety of circuits within a stochastic placement engine. |
14:42 CET | 11.1.4 | (Best Paper Award Candidate) ARE ANALYTICAL TECHNIQUES WORTHWHILE FOR ANALOG IC PLACEMENT? Speaker: Yishuang Lin, Texas A&M University, US Authors: Yishuang Lin1, Yaguang Li1, Donghao Fang1, Meghna Madhusudan2, Sachin S. Sapatnekar2, Ramesh Harjani2 and Jiang Hu1 1Texas A&M University, US; 2University of Minnesota, US Abstract Analytical techniques have long been a prevailing approach to digital IC placement due to their advantage in handling huge size problems. Recently, they were adopted for analog IC placement, where prior methods were mostly based on simulated annealing. However, there lacks a comparative study between the two approaches. Moreover, the impact from different analytical techniques is not clear. This work attempts to shed light on both issues by studying existing methods and developing a new analytical technique. Circuit performance is a critical concern for automated analog layout. To this end, we propose a performance driven analytical analog placement technique, which has not been studied in the past to the best of our knowledge. Experiments were performed on various testcase circuits. For conventional formulation without considering performance, the proposed analytical technique achieves 55 times speedup and 12% wirelength reduction compared to simulated annealing. For performance driven placement, the proposed technique outperforms simulated annealing in terms of circuit performance, area and runtime. Moreover, the proposed technique generally provides better solution quality than a recent previous analytical technique. |
14:46 CET | 11.1.5 | ROUTABILITY-AWARE PLACEMENT FOR ADVANCED FINFET MIXED-SIGNAL CIRCUITS USING SATISFIABILITY MODULO THEORIES Speaker: Hao Chen, University of Texas at Austin, US Authors: Hao Chen1, Walker Turner2, David Z. Pan1 and Haoxing Ren2 1University of Texas at Austin, US; 2NVIDIA Corporation, US Abstract Due to the increasingly complex design rules and geometric layout constraints within advanced FinFET nodes, automated placement of full-custom analog/mixed-signal (AMS) designs has become increasingly challenging. Compared with traditional planar nodes, AMS circuit layout is dramatically different for FinFET technologies due to strict design rules and grid-based restrictions for both placement and routing. This limits previous analog placement approaches in effectively handling all of the new constraints while adhering to the new layout style. Additionally, limited work has demonstrated effective routability modeling, which is crucial for successful routing. This paper presents a robust analog placement framework using satisfiability modulo theories (SMT) for efficient constraint handling and routability modeling. Experimental results based on industrial designs show the effectiveness of the proposed framework in optimizing placement metrics while satisfying the specified constraints. |
14:50 CET | 11.1.6 | CONSTRUCTIVE COMMON-CENTROID PLACEMENT AND ROUTING FOR BINARY-WEIGHTED CAPACITOR ARRAYS Speaker: Nibedita Karmokar, University of Minnesota, Twin Cities, US Authors: Nibedita Karmokar, Arvind Kumar Sharma, Jitesh Poojary, Meghna Madhusudan, Ramesh Harjani and Sachin S. Sapatnekar, University of Minnesota, US Abstract The accuracy and linearity of capacitive digital-to-analog converters (DACs) depend on precise capacitor ratios, but these ratios are perturbed by process variations and parasitics. This paper develops fast constructive procedures for common-centroid placement and routing for binary-weighted capacitors in charge-sharing DACs. Parasitics also degrade the switching speed of a capacitor array, particularly in FinFET nodes with severe wire/via resistances. To overcome this, the capacitor array is placed and routed to optimize switching speed, measured by the 3dB frequency. A balance between 3dB frequency and DAC INL/DNL is shown by trading off via counts with dispersion. The approach delivers high-quality results with low runtimes. |
14:54 CET | 11.1.7 | Q&A SESSION Authors: Manuel Barragan1 and Lars Hedrich2 1Universite Grenoble Alpes, CNRS, Grenoble INP, TIMA, FR; 2Goethe University of Frankfurt/Main, DE Abstract Questions and answers with the authors |
11.2 Approximate Computing Everywhere
Add this session to my calendar
Date: Thursday, 17 March 2022
Time: 14:30 CET - 15:30 CET
Session chair:
Jie Han, University of Alberta, CA
Session co-chair:
Ilaria Scarabottolo, Università della Svizzera, CH
New automated synthesis and optimization methods targeting approximate circuits are presented in the first part of this session. Application papers then deal with new approximation techniques developed for deep neural network accelerators, printed circuit optimization, speech processing, and approximate solutions for stochastic computing. The first paper introduces a new logic synthesis method, which utilizes formal verification engines to generate approximate circuits satisfying quality constraints by construction. The second paper presents a method for optimizing approximate compressor trees in multipliers. The third paper addresses a method developed to generate approximate low-power deep learning accelerators based on TPUs automatically. A new application of approximate computing – the optimization of printed circuits – is introduced in the fourth paper. The fifth paper proposes a speech recognition ASIC based on a target-separable binarized weight network, capable of performing speaker verification and keyword spotting. The authors of the last paper combine approximate and stochastic computing principles in coarse-grained reconfigurable architectures to reduce circuit complexity and power consumption. IP papers deal with a probabilistic-oriented approximate computing method for DNN accelerators and Learned Approximate Computing method capable of tuning the application parameters to maximize the output quality without changing the computation.
Time | Label | Presentation Title Authors |
---|---|---|
14:30 CET | 11.2.1 | MUSCAT: MUS-BASED CIRCUIT APPROXIMATION TECHNIQUE Speaker: Linus Witschen, Paderborn University, DE Authors: Linus Witschen, Tobias Wiersema, Matthias Artmann and Marco Platzner, Paderborn University, DE Abstract Many applications show an inherent resiliency against inaccuracies and errors in their computations. The design paradigm approximate computing exploits this fact by trading off the application’s accuracy against a target metric, e.g., hardware area. This work focuses on approximate computing on the hardware level, where approximate logic synthesis seeks to generate approximate circuits under user-defined quality constraints. We propose the novel approximate logic synthesis method MUSCAT to generate approximate circuits which are valid-by-construction. MUSCAT inserts cutpoints into the netlist to employ the commonly-used concept of substituting connections between gates by constant values, which offers potential for subsequent logic minimization. MUSCAT’s novelty lies in utilizing formal verification engines to identify minimal unsatisfiable subsets. These subsets determine a maximal number of cutpoints that can be activated together without resulting in a violation against the user-defined quality constraints. As a result, MUSCAT determines an optimal solution w.r.t. the number of activated cutpoints while providing a guarantee on the quality constraints. We present the method and experimentally compare MUSCAT’s open-source implementation to AIG rewriting and components from the EvoApproxLib. We show that our method improves upon these state-of-the-art methods by achieving up to 80 % higher savings in circuit area at typically much lower computation times. |
14:34 CET | 11.2.2 | OPACT: OPTIMIZATION OF APPROXIMATE COMPRESSOR TREE FOR APPROXIMATE MULTIPLIER Speaker: Xiao Weihua, Shanghai Jiao Tong University, CN Authors: Weihua Xiao1, Cheng Zhuo2 and Weikang Qian1 1Shanghai Jiao Tong University, CN; 2Zhejiang University, CN Abstract Approximate multipliers have attracted significant attention of researchers for designing low-power systems. The most area-consuming part of a multiplier is its compressor tree (CT). Hence, the prior works proposed various approximate compressors to reduce the area of the CT. However, the compression strategy for the approximate compressors has not been systematically studied: Most of the prior works apply their ad hoc strategies to arrange approximate compressors. In this work, we propose OPACT, a method for optimizing approximate compressor tree for approximate multiplier. An integer linear programming problem is first formulated to co-optimize CT’s area and error. Moreover, since different connection orders of the approximate compressors can affect the error of an approximate multiplier, we formulate another mixed-integer programming problem for optimizing the connection order. The experimental results showed that OPACT can produce approximate multipliers with with an average reduction of 24.4% and 8.4% in power-delay product and mean error distance, respectively, compared to the best existing designs with the same types of approximate compressors used. |
14:38 CET | 11.2.3 | LEARNING TO DESIGN ACCURATE DEEP LEARNING ACCELERATORS WITH INACCURATE MULTIPLIERS Speaker: Paras Jain, UC Berkeley, US Authors: Paras Jain1, Safeen Huda2, Martin Maas3, Joseph Gonzalez1, Ion Stoica1 and Azalia Mirhoseini4 1UC Berkeley, US; 2University of Toronto, CA; 3Google, Inc., US; 4Google, US Abstract Approximate computing is a promising way to improve the power efficiency of deep learning. While recent work proposes new arithmetic circuits (adders and multipliers) that consume substantially less power at the cost of computation errors, these approximate circuits decrease the end-to-end accuracy of common models. We present AutoApprox, a framework to automatically generate approximate low-power deep learning accelerators without any accuracy loss. AutoApprox generates a wide range of approximate ASIC accelerators with a TPUv3 systolic-array template. AutoApprox uses a learned router to assign each DNN layer to an approximate systolic array from a bank of arrays with varying approximation levels. By tailoring this routing for a specific neural network architecture, we discover circuit designs without the accuracy penalty from prior methods. Moreover, AutoApprox optimizes for the end-to-end performance, power and area of the the whole chip and PE mapping rather than simply measuring the performance of the arithmetic units in isolation. To our knowledge, our work is the first to demonstrate the effectiveness of custom-tailored approximate circuits in delivering significant chip-level energy savings with zero accuracy loss on a large-scale dataset such as ImageNet. AutoApprox synthesizes a novel approximate accelerator based on the TPU that reduces end-to-end power consumption by 3.2% and area by 5.2% at a sub-10nm process with no degradation in ImageNet validation top-1 and top-5 accuracy. |
14:42 CET | 11.2.4 | (Best Paper Award Candidate) CROSS-LAYER APPROXIMATION FOR PRINTED MACHINE LEARNING CIRCUITS Speaker: Giorgos Armeniakos, NTUA / KIT, GR Authors: Giorgos Armeniakos1, Georgios Zervakis2, Dimitrios Soudris3, Mehdi Tahoori2 and Joerg Henkel4 1National Technichal University of Athens, GR; 2Karlsruhe Institute of Technology, DE; 3National TU Athens, GR; 4Karlsruhe institute of technology, DE Abstract Printed electronics (PE) feature low non-recurring engineering costs and low per unit-area fabrication costs, enabling thus extremely low-cost and on-demand hardware. Such low-cost fabrication allows for high customization that would be infeasible in silicon, and bespoke architectures prevail to improve the efficiency of emerging PE machine learning (ML) applications. However, even with bespoke architectures, the large feature sizes in PE constraint the complexity of the ML models that can be implemented. In this work, we bring together, for the first time, approximate computing and PE design targeting to enable complex ML models, such as Multi-Layer Perceptrons (MLPs) and Support Vector Machines (SVMs), in PE. To this end, we propose and implement a cross-layer approximation, tailored for bespoke ML architectures. At the algorithmic level we apply a hardware-driven coefficient approximation of the ML model and at the circuit level we apply a netlist pruning through a full search exploration. In our extensive experimental evaluation we consider 14 MLPs and SVMs and evaluate more than 4300 approximate and exact designs. Our results demonstrate that our cross approximation delivers Pareto optimal designs that, compared to the state-of-the-art exact designs, feature 47% and 44% average area and power reduction, respectively, and less than 1% accuracy loss. |
14:46 CET | 11.2.5 | A TARGET-SEPARABLE BWN INSPIRED SPEECH RECOGNITION PROCESSOR WITH LOW-POWER PRECISION-ADAPTIVE APPROXIMATE COMPUTING Speaker: Bo Liu, Southeast University, CN Authors: Bo Liu1, Hao Cai1, Xuan Zhang1, Haige Wu1, Anfeng Xue1, Zilong Zhang1, Zhen Wang2 and Jun Yang1 1Southeast University, CN; 2Nanjing Prochip Electronic Technology Co. Ltd, CN Abstract This paper proposes a speech recognition processor based on a target-separable binarized weight network (BWN), capable of performing both speaker verification (SV) and keyword spotting (KWS). In traditional speech recognition system, the SV based on traditional model and the KWS based on neural networks (NN) model are two independent hardware modules. In this work, both SV and KWS are processed by the proposed BWN with unified training and optimization framework which can be performed for various application scenarios. By the system-architecture co-design, SV and KWS share most of the feature extraction network parameters, and the classification part is calculated separately according to different targets. An energy-efficient NN accelerator which can be dynamically reconfigured to process different layers of the BWN with splitting calculation of frequency domain convolution is proposed. SV and KWS can be achieved with only one time calculation of each input speech frame, which greatly improves the computing energy efficiency. The computing units of the NN accelerator is optimized using precision-adaptive approximate addition tree architecture with Dual-VDD method to further reduce the energy cost. Compared to state-of-the-arts, this work can achieve about 4x reduction in power consumption while maintaining high system adaptability and accuracy. |
14:50 CET | 11.2.6 | TOWARDS ENERGY-EFFICIENT CGRAS VIA STOCHASTIC COMPUTING Speaker: Bo Wang, Chongqing University, CN Authors: Bo Wang1, Rong Zhu1, Jiaxing Shang2 and Dajiang Liu1 1Chongqing University, CN; 2Chongqing Univesity, CN Abstract Stochastic computing (SC) is a promising computing paradigm for low-power and low-cost applications with the added benefit of high error tolerance. Meanwhile, Coarse-Grained Reconfigurable Architecture (CGRA) is also a promising platform for domain-specific applications for its combination of energy efficiency and flexibility. Intuitively, introducing SC to CGRA would synergistically reinforce the strengths of both paradigms. Accordingly, this paper proposes an SC-based CGRA by replacing the exact multiplication in traditional CGRA with an SC-based multiplication, where the problem of accuracy and latency are both improved using parallel stochastic sequence generators and leading zero shifters. In addition, with the flexible connections among PEs, the high-accuracy operation can be easily achieved by combing neighbor PEs without switching costs like power-gating. Compared to the state-of-the-art approximate computing design of CGRA, our proposed CGRA has 16% more energy reduction and 34% energy efficiency improvement while keeping high configuration flexibility. |
14:54 CET | 11.2.7 | Q&A SESSION Authors: Jie Han1 and Ilaria Scarabottolo2 1University of Alberta, CA; 2USI Lugano, CH Abstract Questions and answers with the authors |
11.3 Advanced Mapping and Optimization for Emerging ML Hardware
Add this session to my calendar
Date: Thursday, 17 March 2022
Time: 14:30 CET - 15:30 CET
Session chair:
Jan Moritz Joseph, RWTH AACHEN, DE
Session co-chair:
Elnaz Ansari, Meta/Facebook, US
In this session we present six papers on advanced optimization techniques for model-mapping-hardware co-design. The first paper focuses on optimizing performance of sparse ML models with conventional DRAM. The second broadens the scope to emerging PIM architectures. The third introduces a latency model for diverse hardware architectures. And the last three papers introduce novel evolutionary/genetic algorithmic methods for co-optimizing the model, mapping and hardware.
Time | Label | Presentation Title Authors |
---|---|---|
14:30 CET | 11.3.1 | DASC : A DRAM DATA MAPPING METHODOLOGY FOR SPARSE CONVOLUTIONAL NEURAL NETWORKS Speaker: Bo-Cheng Lai, National Yang Ming Chiao Tung University, TW Authors: Bo-Cheng Lai1, Tzu-Chieh Chiang1, Po-Shen Kuo1, Wan-Ching Wang1, Yan-Lin Hung1, Hung-Ming Chen2, Chien-Nan Liu1 and Shyh-Jye Jou1 1National Yang Ming Chiao Tung University, TW; 2Institute of Electronics, National Chiao Tung University, TW Abstract The data transferring of sheer model size of CNN (Convolution Neural Network) has become one of the main performance challenges in modern intelligent systems. Although pruning can trim down substantial amount of non-effective neurons, the excessive DRAM accesses of the non-zero data in a sparse network still dominate the overall system performance. Proper data mapping can enable efficient DRAM accesses for a CNN. However, previous DRAM mapping methods focus on dense CNN and become less effective when handling the compressed format and irregular accesses of sparse CNN. The extensive design space search for mapping parameters also results in a time-consuming process. This paper proposes DASC: a DRAM data mapping methodology for sparse CNNs. DASC is designed to handle the data patterns and block schedule of sparse CNN to attain good spatial locality and efficient DRAM accesses. The bank-group feature in modern DDR is further exploited to enhance processing parallelism. DASC also introduces an analytical model to facilitate fast exploration and quick convergence of parameter search in minutes instead of days from previous work. When compared with the state-of-the-art, DASC decreases the total DRAM latencies and attains an average of 17.1x, 14.3x, and 14.6x better DRAM performance for sparse AlexNet, VGG-16, and Resnet-50 respectively. |
14:34 CET | 11.3.2 | VW-SDK: EFFICIENT CONVOLUTIONAL WEIGHT MAPPING USING VARIABLE WINDOWS FOR PROCESSING-IN-MEMORY ARCHITECTURES Speaker: Johnny Rhe, Sungkyunkwan University, KR Authors: Johnny Rhe, Sungmin Moon and Jong Hwan Ko, Sungkyunkwan University, KR Abstract With their high energy efficiency, processing-in-memory (PIM) arrays are increasingly used for convolutional neural network (CNN) inference. In PIM-based CNN inference, the computational latency and energy are dependent on how the CNN weights are mapped to the PIM array. A recent study proposed shifted and duplicated kernel (SDK) mapping that reuses the input feature maps with a unit of a parallel window, which is convolved with duplicated kernels to obtain multiple output elements in parallel. However, the existing SDK-based mapping algorithm does not always result in the minimum computing cycles because it only maps a square-shaped parallel window with the entire channels. In this paper, we introduce a novel mapping algorithm called variable-window SDK (VW-SDK), which adaptively determines the shape of the parallel window that leads to the minimum computing cycles for a given convolutional layer and PIM array. By allowing rectangular-shaped windows with partial channels, VW-SDK utilizes the PIM array more efficiently, thereby further reduces the number of computing cycles. The simulation with a 512x512 PIM array and Resnet-18 shows that VW-SDK improves the inference speed by 1.69x compared to the existing SDK-based algorithm. |
14:38 CET | 11.3.3 | A UNIFORM LATENCY MODEL FOR DNN ACCELERATORS WITH DIVERSE ARCHITECTURES AND DATAFLOWS Speaker: Linyan Mei, KU Leuven, CN Authors: Linyan Mei1, Huichu Liu2, Tony Wu3, H. Ekin Sumbul2, Marian Verhelst1 and Edith Beigne2 1KU Leuven, BE; 2Facebook Inc., US; 3Meta/Facebook, US Abstract In the early design phase of a Deep Neural Network (DNN) acceleration system, fast energy and latency estimation are important to evaluate the optimality of different design candidates on algorithm, hardware, and algorithm-to-hardware mapping, given the gigantic design space. This work proposes a uniform intra-layer analytical latency model for DNN accelerators that can be used to evaluate diverse architectures and dataflows. It employs a 3-step approach to systematically estimate the latency breakdown of different system components, capture the operation state of each memory component, and identify stall-induced performance bottlenecks. To achieve high accuracy, different memory attributes, operands' memory sharing scenarios, as well as dataflow implications have been taken into account. Validation against an in-house taped-out accelerator across various DNN layers has shown an average latency model accuracy of 94.3%. To showcase the capability of the proposed model, we carry out 3 case studies to assess respectively the impact of mapping, workloads, and diverse hardware architectures on latency, driving design insights for algorithm-hardware-mapping co-optimization. |
14:42 CET | 11.3.4 | MEDEA: A MULTI-OBJECTIVE EVOLUTIONARY APPROACH TO DNN HARDWARE MAPPING Speaker: Enrico Russo, University of Catania, IT Authors: Enrico Russo1, Maurizio Palesi1, Salvatore Monteleone2, Davide Patti1, Giuseppe Ascia1 and Vincenzo Catania1 1University of Catania, IT; 2Università Niccolò Cusano, IT Abstract Deep Neural Networks (DNNs) embedded domain-specific accelerators enable inference on resource-constrained devices. Making optimal design choices and efficiently scheduling neural network algorithms on these specialized architectures is challenging. Many choices can be made to schedule computation spatially and temporally on the accelerator. Each choice influences the access pattern to the buffers of the architectural hierarchy, affecting the energy and latency of the inference. Each mapping also requires specific buffer capacities and a number of spatial components instances that translate in different chip area occupation. The space of possible combinations, the mapping space, is so large that automatic tools are needed for its rapid exploration and simulation. This work presents MEDEA, an open-source multi-objective evolutionary algorithm based approach toDNNs accelerator mapping space exploration. MEDEA leverages the Timeloop analytical cost model. Differently from the other schedulers that optimize towards a single objective, MEDEA allows deriving the Pareto set of mappings to optimize towards multiple, sometimes conflicting, objectives simultaneously. We found that solutions found by MEDEA dominates in most cases those found by state-of-the-art mappers. |
14:46 CET | 11.3.5 | DIGAMMA: DOMAIN-AWARE GENETIC ALGORITHM FOR HW-MAPPING CO-OPTIMIZATION FOR DNN ACCELERATORS Speaker: Sheng-Chun Kao, Georgia Institute of Technology, US Authors: Sheng-Chun Kao1, Michael Pellauer2, Angshuman Parashar2 and Tushar Krishna1 1Georgia Institute of Technology, US; 2Nvidia, US Abstract The design of DNN accelerators includes two key parts: HW resource configuration and mapping strategy. Intensive research has been conducted to optimize each of them independently. Unfortunately, optimizing for both together is extremely challenging due to the extremely large cross-coupled search space. To address this, in this paper, we propose a HW-Mapping co-optimization framework, an efficient encoding of the immense design space constructed by HW and Mapping, and a domain-aware genetic algorithm, named DiGamma, with specialized operators for improving search efficiency. We evaluate DiGamma with seven popular DNNs models with different properties. Our evaluations show DiGamma can achieve (geomean) 3.0x and 10.0x speedup, comparing to the best-performing baseline optimization algorithms, in edge and cloud settings. |
14:50 CET | 11.3.6 | (Best Paper Award Candidate) ANACONGA: ANALYTICAL HW-CNN CO-DESIGN USING NESTED GENETIC ALGORITHMS Speaker: Nael Fasfous, TU Munich, DE Authors: Nael Fasfous1, Manoj Rohit Vemparala2, Alexander Frickenstein2, Emanuele Valpreda3, Driton Salihu1, Julian Höfer4, Anmol Singh2, Naveen-Shankar Nagaraja2, Hans-Joerg Voegel2, Nguyen Anh Vu Doan1, Maurizio Martina3, Juergen Becker4 and Walter Stechele1 1TU Munich, DE; 2BMW Group, DE; 3Politecnico di Torino, IT; 4Karlsruhe Institute of Technology, DE Abstract We present AnaCoNGA, an analytical co-design methodology, which enables two genetic algorithms to evaluate the fitness of design decisions on layer-wise quantization of a neural network and hardware (HW) resource allocation. We embed a hardware architecture search (HAS) algorithm into a quantization strategy search (QSS) algorithm to evaluate the hardware design Pareto-front of each considered quantization strategy. We harness the speed and flexibility of analytical HW-modeling to enable parallel HW-CNN co-design. With this approach, the QSS is focused on seeking high-accuracy quantization strategies which are guaranteed to have efficient hardware designs at the end of the search. Through AnaCoNGA, we improve the accuracy by 2.88 p.p. with respect to a uniform 2-bit ResNet20 on CIFAR-10, and achieve a 35% and 37% improvement in latency and DRAM accesses, while reducing LUT and BRAM resources by 9% and 59% respectively, when compared to a standard edge variant of the accelerator. |
14:54 CET | 11.3.7 | Q&A SESSION Authors: Jan Moritz Joseph1 and Elnaz Ansari2 1RWTH Aachen University, DE; 2Meta/Facebook, US Abstract Questions and answers with the authors |
11.4 Reconfigurable Systems
Add this session to my calendar
Date: Thursday, 17 March 2022
Time: 14:30 CET - 15:30 CET
Session chair:
Michaela Blott, Xilinx, IE
Session co-chair:
Shreejith Shanker, Trinity College Dublin, IE
This session presents six papers three of which discuss innovative applications including adaptive CNN acceleration for edge scenarios, filtering for big data applications, and a graph processing accelerator. Two papers explore extensions to CGRA hardware to enable improved mapping, and a fast mapping algorithm for CGRAs. Finally, a paper that explores technology mapping for FPGAs based on And-Inverter-Cones.
Time | Label | Presentation Title Authors |
---|---|---|
14:30 CET | 11.4.1 | (Best Paper Award Candidate) ADAFLOW: A FRAMEWORK FOR ADAPTIVE DATAFLOW CNN ACCELERATION ON FPGAS Speaker: Guilherme Korol, Federal University of Rio Grande do Sul - Brazil, BR Authors: Guilherme Korol1, Michael Jordan2, Mateus Beck Rutzig3 and Antonio Carlos Schneider Beck1 1Universidade Federal do Rio Grande do Sul, BR; 2UFRGS, BR; 3UFSM, BR Abstract To meet latency and privacy requirements, resource-hungry deep learning applications have been migrating to the Edge, where IoT devices can offload the inference processing to local Edge servers. Since FPGAs have successfully accelerated an increasing number of deep learning applications (especially CNN-based ones), they emerge as an effective alternative for Edge platforms. However, Edge applications may present highly unpredictable workloads, requiring runtime adaptability in the inference processing. Although some works apply model switching on CPU and GPU platforms by exploiting different pruning rates at runtime, so the inference can adapt according to some quality-performance trade-off, FPGA-based accelerators refrain from this approach since they are synthesized to specific CNN models. In this context, this work enables model switching on FPGAs by adding to the well-known FINN accelerator an extra level of adaptability (i.e., flexibility) and support to the dynamic use of pruning via fast model switch on flexible accelerators, at the cost of some extra logic, or via FPGA reconfigurations of fixed accelerators. From that, we developed AdaFlow: a framework that automatically builds, at design time, a library from these new available versions (flexible and fixed, pruned or not) that will be used, at runtime, to dynamically select a given version according to a user-configurable accuracy threshold and current workload conditions. We have evaluated AdaFlow under a smart Edge surveillance application with two CNN models and two datasets, showing that AdaFlow processes, on average, 1.3x more inferences and increases, on average, 1.4x the power efficiency over state-of-the-art statically deployed dataflow accelerators. |
14:34 CET | 11.4.2 | RAW FILTERING OF JSON DATA ON FPGAS Speaker: Tobias Hahn, FAU, DE Authors: Tobias Hahn, Andreas Becher, Stefan Wildermann and Jürgen Teich, Friedrich-Alexander-Universität Erlangen-Nürnberg, DE Abstract Many Big Data applications include the processing of data streams on semi-structured data formats such as JSON. A disadvantage of such formats is that an application may spend a significant amount of processing time just on unselectively parsing all data. To relax this issue, the concept of raw filtering is proposed with the idea to remove data from a stream prior to the costly parsing stage. However, as accurate filtering of raw data is often only possible after the data has been parsed, raw filters are designed to be approximate in the sense of allowing false-positives in order to be implemented efficiently. Contrary to previously proposed CPU-based raw filtering techniques that are restricted to string matching, we present FPGA-based primitives for filtering strings, numbers and also number ranges. In addition, a primitive respecting the basic structure of JSON data is proposed that can be used to further increase the accuracy of introduced raw filters. The proposed raw filter primitives are designed to allow for their composition according to a given filter expression of a query. Thus, complex raw filters can be created for FPGAs which enable a drastical decrease in the amount of generated false-positives, particularly for IoT workload. As there exists a trade-off between accuracy and resource consumption, we evaluate primitives as well as composed raw filters using different queries from the RiotBench benchmark. Our results show that up to 94.3% of the raw data can be filtered without producing any observed false-positives using only a few hundred LUTs. |
14:38 CET | 11.4.3 | GRAPHWAVE: A HIGHLY-PARALLEL COMPUTE-AT-MEMORY GRAPH PROCESSING ACCELERATOR Speaker: Jinho Lee, National University of Singapore, SG Authors: Jinho Lee, Burin Amornpaisannon, Tulika Mitra and Trevor E. Carlson, National University of Singapore, SG Abstract The fast, efficient processing of graphs is needed to quickly analyze and understand connected data, from large social network graphs, to edge devices performing timely, local data analytics. But, as graph data tends to exhibit poor locality, designing both high-performance and efficient graph accelerators have been difficult to realize. In this work, GraphWave, we take a different approach compared to previous research and focus on maximizing accelerator parallelism with a compute-at-memory approach, where each vertex is paired with a dedicated functional unit. We also demonstrate that this work can improve performance and efficiency by optimizing the accelerator's interconnect with multi-level multi-casting to minimize congestion. Taken together, this work achieves, to the best of our knowledge, a state-of-the-art efficiency of up to 63.94 GTEPS/W with a throughput of 97.80 GTEPS (billion traversed edges per second). |
14:42 CET | 11.4.4 | RF-CGRA: A ROUTING-FRIENDLY CGRA WITH HIERARCHICAL REGISTER CHAINS Speaker: Dajiang Liu, Chongqing University, CN Authors: Rong Zhu, Bo Wang and Dajiang Liu, Chongqing University, CN Abstract CGRAs are promising architectures to accelerate domain-specific applications as they combine high energy-efficiency and flexibility. With either isolated register files (RFs) or link-consuming distributed registers in each processing element (PE), existing CGRAs are all not friendly to data routing for data-flow graphs (DFGs) with a high edge/node ratio since there are many multi-cycle dependences. To this end, this paper proposes a Routing-Friendly CGRA (RF-CGRA) where hierarchical (intra-PE or inter-PE) register chains could be flexibly (wide range of chain length) and compactly (consuming fewer links among PEs) achieved for data routing, resulting in a new mapping problem that requires the improvement of a compiler. Experimental results show that RF-CGRA gets 1.19X performance and 1.14X energy efficiency of the state-of-the-art CGRA with single-cycle multi-hop connections (HyCUBE) while keeping a moderate compilation time. |
14:46 CET | 11.4.5 | PATHSEEKER: A FAST MAPPING ALGORITHM FOR CGRAS Speaker: Mahesh Balasubramanian, Arizona State University, US Authors: Mahesh Balasubramanian and Aviral Shrivastava, Arizona State University, US Abstract Coarse-grained reconfigurable arrays (CGRAs) have gained traction over the years as a low-power accelerator due to the efficient mapping of the compute-intensive loops onto the 2-D array by the CGRA compiler. When encountering a mapping failure for a given node, existing mapping techniques either exit and retry the mapping anew, or perform backtracking, i.e., recursively remove the previously mapped node to find a valid mapping. Abandoning mapping and starting afresh can deteriorate the quality of mapping and the compilation time. Even backtracking may not be the best choice since the previous node may not be the incorrectly placed node. To tackle this issue, we propose PathSeeker -- a mapping approach that analyzes mapping failures and performs local adjustments to the schedule to obtain a mapping. Experimental results on 35 top performance-critical loops from MiBench, Rodinia, and Parboil benchmark suites demonstrate that PathSeeker can map all of them with better mapping quality and dramatically less compilation time than the previous state-of-the-art approaches -- GraphMinor and RAMP, which were unable to map 20 and 5 loops, respectively. Over these benchmarks, PathSeeker achieves 28% better performance at 550x compilation speedup over GraphMinor and 3% better performance at 10x compilation speedup over RAMP on a 4x4 CGRA. |
14:50 CET | 11.4.6 | IMPROVING TECHNOLOGY MAPPING FOR AIC-BASED FPGAS Speaker: Shubham Rai, TU Dresden, DE Authors: Martin Thümmler, Shubham Rai and Akash Kumar, TU Dresden, DE Abstract Commonly, LUTs are used in FPGAs as their main source of configurability. But these large multiplexers have only one output and their area scales exponentially with the number of inputs. As counterpart AND-inverter-cones (AIC) were proposed in 2012. They are a cone-like structure of configurable gates. AICs are not as flexible configurable as LUTs, but have multiple major benefits: First, its structure is inspired by And-Inverter-Graphs, which are currently the predominant form to represent and optimize digital hardware circuits. Second, they provide multiple outputs and are intrinsically fracturable. Therefore logic duplication can be reduced. Additionally, physical AICs can be split into multiple smaller ones without any additional hardware effort. Third, their area scales linearly with the exponentially increasing number of inputs. Additionally, a special form of AICs called Nand-Nor-Cones can be implemented very efficiently, especially for the newly emerging RFET technologies. Technology mapping is one of the crucial tasks to release the full power of AIC based FPGAs. In this thesis the current technology mapping algorithms are reviewed and the following improvements are proposed: First, instead of calculating the required time by choices, a direct required time calculation method is presented. This ensures, that every node has a sensible required time assigned. Second, it is shown that the priority cut calculation method can be replaced by a much simpler direct cut selection method with reduced runtime and similar quality of results. Third, a local subgraph balancing is proposed, to reduce the cone sizes to which cuts get mapped. Combining all of these improvements leads to an average area reduction of over 20\% for the MCNC benchmarks compared to the previous technology mapper, while not increasing the average circuit delay. % Similar improvements are presented for the VTR benchmarks. Additionally, a mapping algorithm to NNCs with three inputs per gate is provided for the first time. Finally, the technology mapper is integrated open-source into the logic synthesis and verification system Abc |
14:54 CET | 11.4.7 | Q&A SESSION Authors: Michaela Blott1 and Shanker Shreejith2 1Xilinx, IE; 2Trinity College Dublin, IE Abstract Questions and answers with the authors |
11.5 An Industrial Perspective on Autonomous Systems Design
Add this session to my calendar
Date: Thursday, 17 March 2022
Time: 14:30 CET - 15:30 CET
Session chair:
Rolf Ernst, TU Braunschweig, DE
Session co-chair:
Selma Saidi, TU Dortmund, DE
This session presents 4 talks from industry sharing current practices and perspectives on autonomous systems and their design. The session discusses several challenges related to software architecture solutions for safe and efficient operational autonomous systems, novel rule-based methods for guaranteeing safety, and requirements on infrastructure for autonomy currently merging CPSs as well as IT domains.
Time | Label | Presentation Title Authors |
---|---|---|
14:30 CET | 11.5.1 | SYMBIOTIC SAFETY: SAFE AND EFFICIENT HUMAN-MACHINE COLLABORATION BY UTILIZING RULES Speaker: Tasuku Ishigooka, Hitachi, Ltd., JP Authors: Tasuku Ishigooka, Hiroyuki Yamada, Satoshi Otsuka, Nobuyasu Kanekawa and Junya Takahashi, Hitachi, Ltd., JP Abstract Collaborative work between workers and autonomous systems in the same area is required to improve operation efficiency. However, there exist collision risks caused by coexistence of workers and autonomous systems. The safety functions of the autonomous systems, such as emergency stops, can reduce the risks and but may decrease the operation efficiency. Therefore, we propose a novel safety concept called Symbiotic Safety. The concept improves both safety and operation efficiency by transformation of action plan, e.g., adjustment of action plan or update of safety rule, which reduces frequency of risk occurrence and suppress efficiency loss due to safety functions. In this paper, we explain the symbiotic safety technologies and share results of an evaluation experiment by utilizing our prototype system. |
14:45 CET | 11.5.2 | A MIDDLEWARE JOURNEY FROM MICROCONTROLLERS TO MICROPROCESSORS Speaker: Alban Tamisier, Apex.AI, FR Authors: Michael Pöhnl, Alban Tamisier and Tobias Blaß, Apex.AI, DE Abstract This paper discusses some of the challenges we encountered when developing Apex.OS, an automotive grade version of the Robot Operating System (ROS) 2. To better understand these challenges, we look back at the best practices used for data communication and software execution in OSEK-based systems. Finally we describe the extensions made in ROS 2, Apex.OS and Apex.Middleware to meet the real-time constraints of the targeted automotive systems. |
15:00 CET | 11.5.3 | RELIABLE DISTRIBUTED SYSTEMS Speaker: Philipp Mundhenk, Robert Bosch GmbH, DE Authors: Philipp Mundhenk, Arne Hamann, Andreas Heyl and Dirk Ziegenbein, Robert Bosch GmbH, DE Abstract The domains of Cyber-Physical Systems (CPSs) and Information Technology (IT) are converging. Driven by the need for increased compute performance, as well as the need for increased connectivity and runtime flexibility, IT hardware, such as microprocessors and Graphics Processing Units (GPUs), as well as software abstraction layers are introduced to CPS. These systems and components are being enhanced for the execution of hard real-time applications. This enables the convergence of embedded and IT: Embedded workloads can be executed reliably on top of IT infrastructure. This is the dawn of Reliable Distributed Systems (RDSs), a technology that combines the performance and cost of IT systems with the reliability of CPSs. The Fabric is a global RDS runtime environment, weaving the interconnections between devices and enabling abstractions for compute, communication, storage, sensing & actuation. This paper outlines the vision of RDS, introduces the aspects required for implementing RDSs and the Fabric, relates existing technologies, and outlines open research challenges. |
15:15 CET | 11.5.4 | PAVE 360 - A PARADIGM SHIFT IN AUTONOMOUS DRIVING VERIFICATION WITH A DIGITAL TWIN Speaker and Author: Tapan Vikas, Siemens EDA GmbH, DE Abstract The talk will show case the benefits of architectural exploration based on Digital Twin approaches. The challenges involved in state of the art Digital twin will be highlighted. Hardware software co-design challenges will be discussed shortly. |
12.1 AI as a Driver for Innovative Applications
Add this session to my calendar
Date: Thursday, 17 March 2022
Time: 15:40 CET - 16:30 CET
Session chair:
Xun Jiao, University of Villanova, US
Session co-chair:
Srinivas Katkoori, University of South Florida, US
This session exploits different AI architectures and methodologies for creating innovative applications. They impact on some fields: from brain-inspired computing, through internet of things up to industry 4.0.
Time | Label | Presentation Title Authors |
---|---|---|
15:40 CET | 12.1.1 | (Best Paper Award Candidate) ALGORITHM-HARDWARE CO-DESIGN FOR EFFICIENT BRAIN-INSPIRED HYPERDIMENSIONAL LEARNING ON EDGE Speaker: Yang Ni, University of California, Irvine, US Authors: Yang Ni1, Yeseong Kim2, Tajana S. Rosing3 and Mohsen Imani4 1University of California, Irvine, US; 2DGIST, KR; 3UCSD, US; 4University of California Irvine, US Abstract Machine learning methods have been widely utilized to provide high quality for many cognitive tasks. Running sophisticated learning tasks requires high computational costs to process a large amount of learning data. Brain-inspired Hyperdimensional (HD) computing is introduced as an alternative solution for lightweight learning on edge devices. However, HD computing models still rely on accelerators to ensure real-time and efficient learning. These hardware designs are not commercially available and need a relatively long period to synthesize and fabricate after deriving the new applications. In this paper, we propose an efficient framework for accelerating the HD computing at the edge by fully utilizing the available computing power. We optimize the HD computing through algorithm-hardware co-design of the host CPU and existing low-power machine learning accelerators, such as Edge TPU. We interpret the lightweight HD learning model as a hyper-wide neural network to take advantage of the accelerator and machine learning platform. We further improve the runtime cost of training by employing a bootstrap aggregating algorithm called bagging while maintaining the learning quality. We evaluate the performance of the proposed framework with several applications. Joint experiments on mobile CPU and the Edge TPU show that our framework achieves 4.5× faster training and 4.2× faster inference compared to the baseline platform. In addition, our framework achieves 19.4× faster training and 8.9× faster inference as compared to embedded ARM CPU, Raspberry Pi, that consumes similar power consumption. |
15:44 CET | 12.1.2 | POISONHD: POISON ATTACK ON BRAIN-INSPIRED HYPERDIMENSIONAL COMPUTING Speaker: Xun Jiao, Villanova University, US Authors: Ruixuan Wang1 and Xun Jiao2 1VU, US; 2Villanova University, US Abstract While machine learning (ML) methods especially deep neural networks (DNNs) promise enormous societal and economic benefits, their deployments present daunting challenges due to intensive computational demands and high storage requirements. Brain-inspired hyperdimensional computing (HDC) has recently been introduced as an alternative computational model that mimics the ``human brain'' at the functionality level. HDC has already demonstrated promising accuracy and efficiency in multiple application domains including healthcare and robotics. However, the robustness and security aspects of HDC has not been systematically investigated and sufficiently examined. Poison attack is a commonly-seen attack on various ML models including DNNs. It injects noises to labels of training data to introduce classification error of ML models. This paper presents PoisonHD, an HDC-specific poison attack framework that maximizes its effectiveness in degrading the classification accuracy by leveraging the internal structural information of HDC models. By applying PoisonHD on three datasets, we show that PoisonHD can cause significantly greater accuracy drop on HDC model than a random label flipping approach. We further develop a defense mechanism by designing an HDC-based data sanitization that can fully recover the accuracy loss caused by poison attack. To the best of our knowledge, this is the first paper that studies the poison attack on HDC models. |
15:48 CET | 12.1.3 | AIME: WATERMARKING AI MODELS BY LEVERAGING ERRORS Speaker: Dhwani Mehta, University of Florida, US Authors: Dhwani Mehta, Nurun Mondol, Farimah Farahmandi and Mark Tehranipoor, University of Florida, US Abstract The recent evolution of deep neural networks (DNNs) has made running complex data analytics tasks, which range from natural language processing, object detection to autonomous cars, artificial intelligence (AI) warfare, cloud, healthcare, industrial robots, and edge devices feasible. The benefits of AI are indisputable. However, there are several concerns regarding the security of the deployed AI models, such as reverse engineering and Intellectual Property (IP) piracy. Accumulating a sufficiently large amount of data - building, training, and improving the model accuracy - to finally deploying the model requires immense human and computational power, making the process expensive. Therefore, it is of utmost importance to protect the model against IP infringement. We propose AIME, a novel watermarking framework that captures model inaccuracy during the training phase and converts it into the owner-specific unique signature. The watermark is embedded within the class mispredictions of the DNN model. Watermark extraction is performed when the model is queried by an owner-specific sequence of key inputs, and the signature is decoded from the sequence of model predictions. AIME works with negligible watermark embedding runtime overhead while preserving the accurate functionality of the DNN. We have performed a comprehensive evaluation of AIME, which models on MNIST, Fashion-MNIST, and CIFAR-10 dataset and corroborated its effectiveness, robustness, and performance. |
15:52 CET | 12.1.4 | THINGNET: A LIGHTWEIGHT REAL-TIME MIRAI IOT VARIANTS HUNTER THROUGH CPU POWER FINGERPRINTING Speaker: Zhuoran Li, Old Dominion University, US Authors: Zhuoran Li and Danella Zhao, Old Dominion University, US Abstract Internet of Things (IoT) devices have become attractive targets of cyber criminals, whereas attackers have been leveraging these vulnerable devices most notably via the infamous Mirai-based botnets, accounting for nearly 90% of IoT malware attacks in 2020. In this work, we propose a robust, universal and non-invasive Mirai-based malware detection engine employing a compact deep neural network architecture. Our design allows programmatic collection of CPU power footprints with integrated current sensors under various device states, such as idle, service and attack. A lightweight online inference model is deployed in the CPU for on-the-fly classification. Our model is robust against noisy environment with a lucid design of noise reduction function. This work appears to be the first step towards a viable CPU malware detection engine based on power fingerprinting. The extensive simulation study under ARM architecture that is widely used in IoT devices, demonstrates a high detection accuracy of 99.1% at a speed less than 1ms. By analyzing Mirai-based infection under distinguishable phases for power feature extraction, our model has further demonstrated an accuracy of 96.3% on model-unknown variants detection. |
15:56 CET | 12.1.5 | M2M-ROUTING: ENVIRONMENTAL ADAPTIVE MULTI-AGENT REINFORCEMENT LEARNING BASED MULTI-HOP ROUTING POLICY FOR SELF-POWERED IOT SYSTEMS Speaker: Wen Zhang, Texas A&M- Corpus Christi, US Authors: Wen Zhang1, Jun Zhang2, Mimi Xie3, Tao Liu4, Wenlu Wang1 and Chen Pan5 1Texas A&M University--Corpus Christi, US; 2Harvard University, US; 3University of Texas at San Antonio, US; 4Lawrence Technological University, US; 5Texas A&M University-Corpus Christi, US Abstract Energy harvesting (EH) technologies facilitate the trending proliferation of IoT devices with sustainable power supplies. However, the intrinsic weak and unstable nature of EH results in frequent and unpredictable power interruptions in EH IoT devices, which further causes unpleasant packet loss or reconnection failures in IoT network. Therefore, conventional routing and energy allocation methods are inefficient in the EH environments. The complexity of the EH environment caused a stumbling block to an intelligent routing policy and energy allocation. To address the problems, this work proposes an environment adaptive Deep Reinforcement Learning (DRL)-based multi-hop routing policy, M2M-Routing, to jointly optimize energy allocation and routing policy, which conquers these challenges through leveraging the offline computation resources. We prepare multi-models for the complicated energy harvesting environment offline. By searching a historical similar power trace to identify the model ID, the prepared DRL model is selected to manage energy allocation and routing policy for the query power traces. Simulation results indicate that M2M-Routing improves the amount of data delivery by 3 times to 4 times compared with baselines. |
16:00 CET | 12.1.6 | Q&A SESSION Authors: Xun Jiao1 and Srinivas Katkoori2 1Villanova University, US; 2University of South Florida, US Abstract Questions and answers with the authors |
12.2 Applications of optimized quantum and probabilistic circuits in emergent computing systems
Add this session to my calendar
Date: Thursday, 17 March 2022
Time: 15:40 CET - 16:30 CET
Session chair:
Giulia Meuli, Synopsys, IT
Session co-chair:
Yvain Thonnart, CEA, FR
Emerging computing platforms such as near-term quantum computers are currently based on the executions of circuits that have a gate set which is specific to the corresponding hardware platform. These systems are currently still strongly impacted by noise or decoherence and the success of such calculations depend strongly on circuit depth the effort to input data. This session discusses the use of classical and machine learning approaches to optimize these circuits both in complexity and noise resilience.
Time | Label | Presentation Title Authors |
---|---|---|
15:40 CET | 12.2.1 | MUZZLE THE SHUTTLE: EFFICIENT COMPILATION FOR MULTI-TRAP TRAPPED-ION QUANTUM COMPUTERS Speaker: Abdullah Ash Saki, Pennsylvania State University, US Authors: Abdullah Ash- Saki1, Rasit Onur Topaloglu2 and Swaroop Ghosh1 1Pennsylvania State University, US; 2IBM, US Abstract Trapped-ion systems can have a limited number of ions (qubits) in a single trap. Increasing the qubit count to run meaningful quantum algorithms would require multiple traps where ions need to shuttle between traps to communicate. The existing compiler has several limitations, which result in a high number of shuttle operations and degraded fidelity. In this paper, we target this gap and propose compiler optimizations to reduce the number of shuttles. Our technique achieves a maximum reduction of 51.17% in shuttles (average ~ 33%) tested over 125 circuits. Furthermore, the improved compilation enhances the program fidelity up to 22.68X with a modest increase in the compilation time. |
15:44 CET | 12.2.2 | CIRCUITS FOR MEASUREMENT BASED QUANTUM STATE PREPARATION Speaker: Niels Gleinig, ETH Zurich, DE Authors: Niels Gleinig and Torsten Hoefler, ETH Zürich, CH Abstract In quantum computing, state preparation is the problem of synthesizing circuits that initialize quantum systems to specific states. It has been shown that there are states that require circuits of exponential size to be prepared (when not using measurements), and consequently, despite extensive research on this problem, the existing computer-aided design (CAD) methods produce circuits of exponential size. This is even the case for the methods that solve this problem on the important subclass of uniform states, which for example need to be prepared when using Quantum Simulated Annealing algorithms to solve combinatorial optimization problems. In this paper, we show how CAD based state preparation can be made scalable by using techniques that are unique to quantum computing: amplitude amplification, measurements, and the resulting state collapses. With this approach, we are able to produce wide classes of states in polynomial time, resulting in an exponential improvement over existing CAD methods. |
15:48 CET | 12.2.3 | OPTIC: A PRACTICAL QUANTUM BINARY CLASSIFIER FOR NEAR-TERM QUANTUM COMPUTERS Speaker: Daniel Silver, Northeastern University, US Authors: Tirthak Patel, Daniel Silver and Devesh Tiwari, Northeastern University, US Abstract Quantum computers can theoretically speed up optimization workloads such as variational machine learning and classification workloads over classical computers. However, in practice, proposed variational algorithms have not been able to run on existing quantum computers for practical-scale problems owing to their error-prone hardware. We propose OPTIC, a framework to effectively execute quantum binary classification on real noisy intermediate-scale quantum (NISQ) computers. |
15:52 CET | 12.2.4 | SCALABLE VARIATIONAL QUANTUM CIRCUITS FOR AUTOENCODER-BASED DRUG DISCOVERY Speaker: Junde Li, Pennsylvania State University, US Authors: Junde Li and Swaroop Ghosh, Pennsylvania State University, US Abstract The de novo design of drug molecules is recognized as a time-consuming and costly process, and computational approaches have been applied in each stage of the drug discovery pipeline. Variational autoencoder is one of the computer-aided design methods which explores the chemical space based on existing molecular dataset. Quantum machine learning has emerged as an atypical learning method that may speed up some classical learning tasks because of its strong expressive power. However, near-term quantum computers suffer from limited number of qubits which hinders the representation learning in high dimensional spaces. We present a scalable quantum generative autoencoder (SQ-VAE) for simultaneously reconstructing and sampling drug molecules, and a corresponding vanilla variant (SQ-AE) for better reconstruction. The architectural strategies in hybrid quantum classical networks such as, adjustable quantum layer depth, heterogeneous learning rates, and patched quantum circuits are proposed to learn high dimensional dataset such as, ligand-targeted drugs. Extensive experimental results are reported for different dimensions including 8x8 and 32x32 after choosing suitable architectural strategies. The performance of quantum generative autoencoder is compared with the corresponding classical counterpart throughout all experiments. The results show that quantum computing advantages can be achieved for normalized low-dimension molecules, and that high-dimension molecules generated from quantum generative autoencoders have better drug properties within the same learning period. |
15:56 CET | 12.2.5 | TOWARDS LOW-COST HIGH-ACCURACY STOCHASTIC COMPUTING ARCHITECTURE FOR UNIVARIATE FUNCTIONS: DESIGN AND DESIGN SPACE EXPLORATION Speaker: Kuncai Zhong, Shanghai Jiao Tong University, CN Authors: Kuncai Zhong, Zexi Li and Weikang Qian, Shanghai Jiao Tong University, CN Abstract Univariate functions are widely used. Several recent works propose to implement them by an unconventional computing paradigm, stochastic computing (SC). However, existing SC designs either have a high hardware cost due to the area consuming randomizer or a low accuracy. In this work, we propose a low-cost high-accuracy SC architecture for univariate functions. It consists of only a single stochastic number generator and a minimum number of D flip-flops. We also apply three methods, random number source (RNS) negating, RNS scrambling, and input scrambling, to improve the accuracy of the architecture. To efficiently configure the architecture to achieve a high accuracy, we further propose a design space exploration algorithm. The experimental results show that compared to the conventional architecture, the area of the proposed architecture is reduced by up to 76%, while its accuracy is close to or sometimes even higher than that of the conventional architecture. |
16:00 CET | 12.2.6 | Q&A SESSION Authors: Giulia Meuli1 and Yvain Thonnart2 1Synopsys, IT; 2CEA-Leti, FR Abstract Questions and answers with the authors |
12.3 Reliable safe and approximate systems
Add this session to my calendar
Date: Thursday, 17 March 2022
Time: 15:40 CET - 16:30 CET
Session chair:
Angeliki Kritikakou, IRISA, FR
Session co-chair:
Marcello Traiola, INRIA, FR
The session proposes techniques for reliable, safe, and approximate computing over many different architectures, ranging from traditional systems to Neural network accelerators and hyper-dimensional computing.
Time | Label | Presentation Title Authors |
---|---|---|
15:40 CET | 12.3.1 | (Best Paper Award Candidate) DO TEMPERATURE AND HUMIDITY EXPOSURES HURT OR BENEFIT YOUR SSDS? Speaker: Adnan Maruf, Florida International University, US Authors: Adnan Maruf1, Sashri Brahmakshatriya1, Baolin Li2, Devesh Tiwari2, Gang Quan1 and Janki Bhimani1 1Florida International University, US; 2Northeastern University, US Abstract SSDs are becoming mainstream data storage devices, replacing HDDs in most data centers, consumer goods, and IoT gadgets. In this work, we ask an uncharted research question: What is the environmental conditions' impact on SSD performance? To answer it, we systematically measure, quantify, and characterize the impact of various commonly changing environmental conditions such as temperature and humidity on the performance of SSDs. Our experiments and analysis uncover that exposure to changes in temperature and humidity can significantly affect SSD performance. |
15:44 CET | 12.3.2 | SAFEDM: A HARDWARE DIVERSITY MONITOR FOR REDUNDANT EXECUTION ON NON-LOCKSTEPPED CORES Speaker: Francisco Bas, Barcelona Supercomputing Center (BSC), Universitat Politècnica de Catalunya (UPC), ES Authors: Francisco Bas1, Pedro Benedicte2, Sergi Alcaide1, Guillem Cabo2, Fabio Mazzocchetti2 and Jaume Abella2 1Universitat Politècnica de Catalunya - Barcelona Supercomputing Center, ES; 2Barcelona Supercomputing Center, ES Abstract Computing systems in the safety domain, such as those in avionics or space, require specific safety measures related to the criticality of the deployment. A problem these systems face is that of transient failures in hardware. A solution commonly used to tackle potential failures is to introduce redundancy in these systems, for example 2 cores that execute the same program at the same time. However, redundancy does not solve all potential failures, such as Common Cause Failures (CCF), where a single fault affects both cores identically (e.g. a voltage droop). If both redundant cores have identical state when the fault occurs, then there may be a CCF since the fault can affect both cores in the same way. To avoid CCF it is critical to know that there is diversity in the execution amongst the redundant cores. In this paper we introduce SafeDM, a hardware Diversity Monitor that quantifies the diversity of each redundant processor to guarantee that CCF will not go unnoticed, and without needing to deploy lockstepped cores. SafeDM computes data and instruction diversity separately, using different techniques appropriate for each case. We integrate SafeDM in a RISC-V FPGA space MPSoC from Cobham Gaisler where SafeDM is proven effective with a large benchmark suite, incurring low area and power overheads. Overall, SafeDM is an effective hardware solution to quantify diversity in cores performing redundant execution. |
15:48 CET | 12.3.3 | IS APPROXIMATION UNIVERSALLY DEFENSIVE AGAINST ADVERSARIAL ATTACKS IN DEEP NEURAL NETWORKS? Speaker: Ayesha Siddique, University of Missouri, US Authors: Ayesha Siddique and Khaza Anuarul Hoque, University of Missouri, US Abstract Approximate computing is known for its effectiveness in improvising the energy efficiency of deep neural network (DNN) accelerators at the cost of slight accuracy loss. Very recently, the inexact nature of approximate components, such as approximate multipliers have also been reported successful in defending adversarial attacks on DNNs models. Since the approximation errors traverse through the DNN layers as masked or unmasked, this raises a key research question-can approximate computing always offer a defense against adversarial attacks in DNNs, i.e., are they universally defensive? Towards this, we present an extensive adversarial robustness analysis of different approximate DNN accelerators (AxDNNs) using the state-of-the-art approximate multipliers. In particular, we evaluate the impact of ten adversarial attacks on different AxDNNs using the MNIST and CIFAR-10 datasets. Our results demonstrate that adversarial attacks on AxDNNs can cause 53% accuracy loss whereas the same attack may lead to almost no accuracy loss (as low as 0.06%) in the accurate DNN. Thus, approximate computing cannot be referred to as a universal defense strategy against adversarial attacks. |
15:52 CET | 12.3.4 | RELIABILITY ANALYSIS OF A SPIKING NEURAL NETWORK HARDWARE ACCELERATOR Speaker: Theofilos Spyrou, Sorbonne University, CNRS, LIP6, FR Authors: Theofilos Spyrou1, Sarah A. Elsayed1, Engin Afacan2, Luis A. Camuñas Mesa3, Barnabé Linares-Barranco3 and Haralampos-G. Stratigopoulos1 1Sorbonne Université, CNRS, LIP6, FR; 2Gebze TU, TR; 3IMSE-CNM, CSIC, University of Sevilla, ES Abstract Despite the parallelism and sparsity in neural network models, their transfer into hardware unavoidably makes them susceptible to hardware-level faults. Hardware-level faults can occur either during manufacturing, such as physical defects and process-induced variations, or in the field due to environmental factors and aging. The performance under fault scenarios needs to be assessed so as to develop cost-effective fault-tolerance schemes. In this work, we assess the resilience characteristics of a hardware accelerator for Spiking Neural Networks (SNNs) designed in VHDL and implemented on an FPGA. The fault injection experiments pinpoint the parts of the design that need to be protected against faults, as well as the parts that are inherently fault-tolerant. |
15:56 CET | 12.3.5 | RELIABILITY OF GOOGLE’S TENSOR PROCESSING UNITS FOR EMBEDDED APPLICATIONS Speaker: Rubens Luiz Rech Junior, Institute of Informatics, UFRGS, BR Authors: Rubens Luiz Rech Junior1 and Paolo Rech2 1UFRGS, BR; 2LANL/UFRGS, US Abstract Convolutional Neural Networks (CNNs) have become the most used and efficient way to identify and classify objects in a scene. CNNs are today fundamental not only for autonomous vehicles, but also for Internet of Things (IoT) and smart cities or smart homes. Vendors are developing low-power, efficient, and low-cost dedicated accelerators to allow the execution of the computational-demanding CNNs even in embedded applications with strict power and cost budgets. Google's Coral Tensor Processing Unit (TPU) is one of the latest low power accelerators for CNNs. In this paper we investigate the reliability of TPUs to atmospheric neutrons, reporting experimental data equivalent to more than 30 million years of natural irradiation. We analyze the behavior of TPUs executing atomic operations (standard or depthwise convolutions) with increasing input sizes as well as eight CNN designs typical of embedded applications, including transfer learning and reduced data-set configurations. We found that, despite the high error rate, most neutrons-induced errors only slightly modify the convolution output and do not change the CNNs detection or classification. By reporting details about the fault model and error rate, we provide valuable information on how to evaluate and improve the reliability of CNNs executed on a TPU. |
16:00 CET | 12.3.6 | Q&A SESSION Authors: Angeliki Kritikakou1 and Marcello Traiola2 1Univ Rennes, Inria, CNRS, IRISA, FR; 2Inria / IRISA, FR Abstract Questions and answers with the authors |
12.4 Raising Performance and Reliability of the Memory Subsystem
Add this session to my calendar
Date: Thursday, 17 March 2022
Time: 15:40 CET - 16:30 CET
Session chair:
Leonidas Kosmidis, Barcelona Supercomputing Center, ES
Session co-chair:
Thaleia Dimitra Doudali, IMDEA Software Institute, ES
Performance and reliability are important considerations for modern architectures. This session includes papers addressing these concerns with novel physical design paradigms in EDA and emerging memory technologies. The first three papers lie in the intersection of architecture and physical design with 3D stacking, Silicon Carbide and an automated flow for taping-out GPU designs. The next three papers include a solution for an adaptive error correction scheme in DRAM, a reduced latency logging for crash recovery in systems based on persistent memory, as well as an in-memory accelerator based on Resistive RAM for bioinformatics.
Time | Label | Presentation Title Authors |
---|---|---|
15:40 CET | 12.4.1 | STEALTH ECC: A DATA-WIDTH AWARE ADAPTIVE ECC SCHEME FOR DRAM ERROR RESILIENCE Speaker: Young Seo Lee, Korea University, KR Authors: Young Seo Lee1, Gunjae Koo1, Young-Ho Gong2 and Sung Woo Chung1 1Korea University, KR; 2KwangWoon University, KR Abstract As DRAM process technology scales down and DRAM density continues to grow, DRAM errors have become a primary concern in modern data centers. Typically, data centers have adopted memory systems with a single error correction double error detection (SECDED) code. However, the SECDED code is not sufficient to satisfy DRAM reliability demands as memory systems get more vulnerable. Though the servers in data centers employ strong ECC schemes such as Chipkill, such ECC schemes lead to substantial performance and/or storage overhead. In this paper, we propose Stealth ECC, a cost-effective memory protection scheme providing stronger error correctability than the conventional SECDED code, with negligible performance overhead and without storage overhead. Depending on the data-width (either narrow-width or full-width), Stealth ECC adaptively selects ECC schemes. For narrow-width values, Stealth ECC provides multi-bit error correctability by storing more parity bits in MSB side, instead of zeros. Furthermore, with bitwise interleaved data placement between x4 DRAM chips, Stealth ECC is robust to a single DRAM chip error for narrow-width values. On the other hand, for full-width values, Stealth ECC adopts the SECDED code, which maintains DRAM reliability comparable to the conventional SECDED code. As a result, thanks to the reliability improvement of narrow-width values, Stealth ECC enhances overall DRAM reliability, while incurring negligible performance overhead as well as no storage overhead. Our simulation results show that Stealth ECC reduces the probability of system failure (caused by DRAM errors) by 47.9%, on average, with only 0.9% performance overhead compared to the conventional SECDED code. |
15:44 CET | 12.4.2 | ACCELERATE HARDWARE LOGGING TO EFFICIENTLY GUARANTEE PM CRASH CONSISTENCY Speaker: Zhiyuan Lu, Michigan Tech. University, US Authors: Zhiyuan Lu1, Jianhui Yue1, Yifu Deng1 and Yifeng Zhu2 1Michigan Tech. University, US; 2University of Maine, US Abstract While logging has been adopted in persistent memory (PM) to support crash consistency, logging incurs severe performance overhead. This paper discovers two common factors that contribute to the inefficiency of logging: (1) load imbalance among memory banks, and (2) constraints of intra-record ordering. Over-loaded memory banks may significantly prolong the waiting time of log requests targeting these banks. To address this issue, we propose a novel log entry allocation scheme (LALEA) that reshapes the traffic distribution over PM banks. In addition, the intra-record ordering between a header and its log entries decreases the degree of parallelism in log operations. We design a log metadata buffering scheme (BLOM) that totally eliminates the intra-record ordering constraints. These two proposed log optimizations are general and can be applied to many existing designs. We evaluate our designs using both micro-benchmarks and real PM applications. Our experimental results show that LALEA and BLOM can achieve 54.04% and 17.16% higher transaction throughput on average, compared to two state-of-the-art designs, respectively. |
15:48 CET | 12.4.3 | (Best Paper Award Candidate) MEMPOOL-3D: BOOSTING PERFORMANCE AND EFFICIENCY OF SHARED-L1 MEMORY MANY-CORE CLUSTERS WITH 3D INTEGRATION Speaker: Matheus Cavalcante, ETH Zürich, CH Authors: Matheus Cavalcante1, Anthony Agnesina2, Samuel Riedel1, Moritz Brunion3, Alberto Garcia-Ortiz4, Dragomir Milojevic5, Francky Catthoor5, Sung Kyu Lim2 and Luca Benini6 1ETH Zürich, CH; 2Georgia Tech, US; 3University of Bremen, DE; 4ITEM (U.Bremen), DE; 5IMEC, BE; 6Università di Bologna and ETH Zürich, IT Abstract Three-dimensional integrated circuits promise power, performance, and footprint gains compared to their 2D counterparts, thanks to drastic reductions in the interconnects' length through their smaller form factor. We can leverage the potential of 3D integration by enhancing MemPool, an open-source many-core design with 256 cores and a shared pool of L1 scratchpad memory connected with a low-latency interconnect. MemPool's baseline 2D design is severely limited by routing congestion and wire propagation delay, making the design ideal for 3D integration. In architectural terms, we increase MemPool's scratchpad memory capacity beyond the sweet spot for 2D designs, improving performance in a common digital signal processing kernel. We propose a 3D MemPool design that leverages a smart partitioning of the memory resources across two layers to balance the size and utilization of the stacked dies. In this paper, we explore the architectural and the technology parameter spaces by analyzing the power, performance, area, and energy efficiency of MemPool instances in 2D and 3D with 1 MiB, 2 MiB, 4 MiB, and 8 MiB of scratchpad memory in a commercial 28 nm technology node. We observe a performance gain of 9.1 % when running a matrix multiplication on the MemPool-3D design with 4 MiB of scratchpad memory compared to the MemPool 2D counterpart. In terms of energy efficiency, we can implement the MemPool-3D instance with 4 MiB of L1 memory on an energy budget 15 % smaller than its 2D counterpart, and even 3.7 % smaller than the MemPool-2D instance with one-fourth of the L1 scratchpad memory capacity. |
15:52 CET | 12.4.4 | REPAIR: A RERAM-BASED PROCESSING-IN-MEMORY ACCELERATOR FOR INDEL REALIGNMENT Speaker: Chin-Fu Nien, Academia Sinica, TW Authors: Ting Wu1, Chin-Fu Nien2, Kuang-Chao Chou3 and Hsiang-Yun Cheng2 1Electrical and Computer Engineering, Carnegie Mellon university, US; 2Academia Sinica, TW; 3Gradute Institute of Electronics Engineering, National Taiwan University, TW Abstract Genomic analysis has attracted a lot of interest recently since it is the key to realizing precision medicine for diseases such as cancer. Among all the genomic analysis pipeline stages, Indel Realignment is the most time-consuming and induces intensive data movements. Thus, we propose RePAIR, the first ReRAM-based processing-in-memory accelerator targeting the Indel Realignment algorithm. To further increase the computation parallelism, we design several mapping and scheduling optimization schemes. RePAIR achieves 7443x speedup and is 27211x more energy efficient over the GATK3.8 running on a CPU server, significantly outperforming the state-of-the-art. |
15:56 CET | 12.4.5 | SIC PROCESSORS FOR EXTREME HIGH-TEMPERATURE VENUS SURFACE EXPLORATION Speaker: Heewoo Kim, University of Michigan, Ann Arbor, US Authors: Heewoo Kim, Javad Bagherzadeh and Ronald Dreslinski, University of Michigan, US Abstract Being the ‘sister planet’ of the Earth, surface exploration of Venus is expected to provide valuable scientific insights into the history and the environment of the Earth. Despite the benefits, the surface temperature of Venus, at 450C, poses a large challenge for any surface exploration. In particular, conventional Silicon electronics do not properly function under such high temperatures. Due to this constraint, the most prolonged previous surface exploration lasted only for 2 hours. Silicon Carbide (SiC) electronics, which can endure and function properly in high-temperature environments, is proposed as a strong candidate to be used in Venus surface explorations. However, this technology is still immature and associated with limiting factors, such as slower speed, power constraint, limited die area, and approximately 1,000 times longer channel than the state-of-the-art Si transistors. In this paper, we configure a computing infrastructure for high-temperature SiC-based technology, conduct design space exploration, and evaluate the performance of different SiC processors when used in Venus surface landers. Our evaluation shows that the SiC processor has an average 16.6X lower throughput than the RAD6000 Si processor used in the previous Mars rover. The Venus rover with SiC processor is expected to have a moving speed of 0.6 meters per hour and visual odometry processing time of 50 minutes. Lastly, we provide the design guidelines to improve the SiC processors at the microarchitecture and the instruction set architecture levels. |
16:00 CET | 12.4.6 | Q&A SESSION Authors: Leonidas Kosmidis1 and Thaleia Dimitra Doudali2 1Universitat Politècnica de Catalunya - Barcelona Supercomputing Center, ES; 2IMDEA Software Institute, ES Abstract Questions and answers with the authors |
12.5 Bringing Robust Deep Learning to the Autonomous Edge: New Challenges and Algorithm-Hardware Solutions
Add this session to my calendar
Date: Thursday, 17 March 2022
Time: 15:40 CET - 16:30 CET
Session chair:
Dirk Ziegenbein, Robert Bosch GmbH, DE
Session co-chair:
Chung-Wei Lin, National Taiwan University, TW
Deep neural networks (DNNs) are being continually deployed at the autonomous edge systems for many applications, such as speech recognition, image classification, and object detection. While DNNs have proven to be effective in handling these tasks, their robustness (i.e. accuracy) can suffer postdeployment at the edge. Moreover, designing robust deep learning algorithms for the autonomous edge is highly challenging because such systems are severely resource-constrained. This session includes four different invited talks that present the challenges and propose novel, lightweight algorithm-hardware codesign methods to improve DNN robustness at the edge. The first paper evaluates the effectiveness of various unsupervised DNN adaptation methods on real-world edge systems, followed by selecting the best technique in terms of accuracy, performance and energy. The second paper explores a lightweight image super-resolution technique to prevent adversarial attacks, which is also characterized on an Arm neural processing unit. The third paper tackles the issue of loss in DNN prediction accuracy in resistive memory-based in-memory accelerators by proposing a stochastic fault-tolerant training scheme. The final paper focuses on robust distributed reinforcement learning for swarm intelligence where it analyzes and mitigates the effect of various transient/permanent faults.
Time | Label | Presentation Title Authors |
---|---|---|
15:40 CET | 12.5.1 | UNSUPERVISED TEST-TIME ADAPTATION OF DEEP NEURAL NETWORKS AT THE EDGE: A CASE STUDY Speaker: Kshitij Bhardwaj, Lawrence Livermore National Laboratory, US Authors: Kshitij Bhardwaj, James Diffenderfer, Bhavya Kailkhura and Maya Gokhale, LLNL, US Abstract Deep learning is being increasingly used in mobile and edge autonomous systems. The prediction accuracy of deep neural networks (DNNs), however, can degrade after deployment due to encountering data samples whose distributions are differ- ent than the training samples. To continue to robustly predict, DNNs must be able to adapt themselves post-deployment. Such adaptation at the edge is challenging as new labeled data may not be available, and it has to be performed on a resource- constrained device. This paper performs a case study to evaluate the cost of test-time fully unsupervised adaptation strategies on a real-world edge platform: Nvidia Jetson Xavier NX. In particular, we adapt pretrained state-of-the-art robust DNNs (trained using data augmentation) to improve the accuracy on image classification data that contains various image corruptions. During this prediction-time on-device adaptation, the model parameters of a DNN are updated using a single backpropagation pass while optimizing entropy loss. The effects of following three simple model updates are compared in terms of accuracy, adaptation time and energy: updating only convolutional (Conv- Tune); only fully-connected (FC-Tune); and only batch-norm parameters (BN-Tune). Our study shows that BN-Tune and Conv- Tune are more effective than FC-Tune in terms of improving accuracy for corrupted images data (average of 6.6%, 4.97%, and 4.02%, respectively over no adaptation). However, FC-Tune leads to significantly faster and more energy efficient solution with a small loss in accuracy. Even when using FC-Tune, the extra overheads of on-device fine-tuning are significant to meet tight real-time deadlines (209ms). This study motivates the need for designing hardware-aware robust algorithms for efficient on- device adaptation at the autonomous edge. |
15:50 CET | 12.5.2 | SUPER-EFFICIENT SUPER RESOLUTION FOR FAST ADVERSARIAL DEFENSE AT THE EDGE Speaker: Kartikeya Bhardwaj, Arm Inc., US Authors: Kartikeya Bhardwaj1, Dibakar Gope2, James Ward3, Paul Whatmough2 and Danny Loh4 1Arm Inc., US; 2Arm Research, US; 3Arm Inc., IE; 4Arm Inc., GB Abstract Autonomous systems are highly vulnerable to a variety of adversarial attacks on Deep Neural Networks (DNNs). Training-free model-agnostic defenses have recently gained popularity due to their speed, ease of deployment, and ability to work across many DNNs. To this end, a new technique has emerged for mitigating attacks on image classification DNNs, namely, preprocessing adversarial images using super resolution -- upscaling low-quality inputs into high-resolution images. This defense requires running both image classifiers and super resolution models on constrained autonomous systems. However, super resolution incurs a heavy computational cost. Therefore, in this paper, we investigate the following question: Does the robustness of image classifiers suffer if we use tiny super resolution models? To answer this, we first review a recent work called Super-Efficient Super Resolution (SESR) that achieves similar or better image quality than prior art while requiring 2x to 330x fewer Multiply-Accumulate (MAC) operations. We demonstrate that despite being orders of magnitude smaller than existing models, SESR achieves the same level of robustness as significantly larger networks. Finally, we estimate end-to-end performance of super resolution-based defenses on a commercial Arm Ethos-U55 micro-NPU. Our findings show that SESR achieves nearly 3x higher FPS than a baseline while achieving similar robustness. |
16:00 CET | 12.5.3 | FAULT-TOLERANT DEEP NEURAL NETWORKS FOR PROCESSING-IN-MEMORY BASED AUTONOMOUS EDGE SYSTEMS Speaker: Xue Lin, Northeastern University, US Authors: Siyue Wang1, Geng Yuan1, Xiaolong Ma1, Yanyu Li1, Xue Lin1 and Bhavya Kailkhura2 1Northeastern University, US; 2LLNL, US Abstract In-memory deep neural network (DNN) accelerators will be the key for energy-efficient autonomous edge systems. The resistive random access memory (ReRAM) is a potential solution for the non-CMOS-based in-memory computing platform for energy-efficient autonomous edge systems, thanks to its promising characteristics, such as near-zero leakage-power and non-volatility. However, due to the hardware instability of ReRAM, the weights of the DNN model may deviate from the originally trained weights, resulting in accuracy loss. To mitigate this undesirable accuracy loss, we propose two stochastic fault-tolerant training methods to generally improve the models' robustness without dealing with individual devices. Moreover, we propose Stability Score -- a comprehensive metric that serves as an indicator to the instability problem. Extensive experiments demonstrate that the DNN models trained using our proposed stochastic fault-tolerant training method achieve superior performance, which provides better flexibility, scalability, and deployability of ReRAM on the autonomous edge systems. |
16:10 CET | 12.5.4 | FRL-FI: TRANSIENT FAULT ANALYSIS FOR FEDERATED REINFORCEMENT LEARNING-BASED NAVIGATION SYSTEMS Speaker: Arijit Raychowdhury, Georgia Institute of Technology, US Authors: Zishen Wan1, Aqeel Anwar1, Abdulrahman Mahmoud2, Tianyu Jia3, Yu-Shun Hsiao2, Vijay Reddi2 and Arijit Raychowdhury1 1Georgia Institute of Technology, US; 2Harvard University, US; 3Carnegie Mellon University, US Abstract Swarm intelligence is being increasingly deployed in autonomous systems, such as drones and unmanned vehicles. Federated reinforcement learning (FRL), a key swarm intelligence paradigm where agents interact with their own environments and cooperatively learn a consensus policy while preserving privacy, has recently shown potential advantages and gained popularity. However, transient faults are increasing in the hardware system with continuous technology node scaling and can pose threats to FRL systems. Meanwhile, conventional redundancy-based protection methods are challenging to deploy on resource-constrained edge applications. In this paper, we experimentally evaluate the fault tolerance of FRL navigation systems at various scales with respect to fault models, fault locations, learning algorithms, layer types, communication intervals, and data types at both training and inference stages. We further propose two cost-effective fault detection and recovery techniques that can achieve up to 3.3x improvement in resilience with <2.7% overhead in FRL systems. |
16:20 CET | 12.5.5 | Q&A SESSION Authors: Dirk Ziegenbein1 and Chung-Wei Lin2 1Robert Bosch GmbH, DE; 2National Taiwan University, TW Abstract Questions and answers with the authors |
13.1 New Perspectives in Test and Diagnosis
Add this session to my calendar
Date: Thursday, 17 March 2022
Time: 16:40 CET - 17:20 CET
Session chair:
Melanie Schillinsky, NXP Semiconductors Germany GmbH, DE
Session co-chair:
Riccardo Cantoro, Politecnico di Torino, IT
This session covers new techniques for cell-aware test, fault modeling, test and diagnosis for hardware security primitives, machine-learning enabled diagnosis for monolithic 3D circuits, as well as static compaction for SBST in GPU architectures.
Time | Label | Presentation Title Authors |
---|---|---|
16:40 CET | 13.1.1 | IMPROVING CELL-AWARE TEST FOR INTRA-CELL SHORT DEFECTS Speaker: Dong-Zhen Lee, National Yang Ming Chiao Tung University, TW Authors: Dong-Zhen Li1, Ying-Yen Chen2, Kai-Chiang Wu3 and Chia-Tso Chao1 1National Yang Ming Chiao Tung University, TW; 2Realtek Semiconductor Corporation, TW; 3Department of Computer Science, National Chiao Tung University, TW Abstract Conventional fault models define their faulty behavior at the IO ports of standard cells with simple rules of fault activation and fault propagation. However, there still exist some defects inside a cell (intra-cell) that cannot be effectively detected by the test patterns of conventional fault models and hence become a source of DPPM. In order to further increase the defect coverage, many research works have been conducted to study the fault models resulting from different types of intra-cell defects, by SPICE-simulating each targeted defect with its equivalent circuit-level defect model. In this paper, we propose to improve cell-aware (CA) test methodology by concentrating on intra-cell bridging faults due to short defects inside standard cells. The faults extracted are based on examining the actual physical proximity of polygons in the layout of a cell, and are thus more realistic and reasonable than those (faults) determined by RC extraction. Experimental results on a set of industrial designs show that the proposed methodology can indeed improve the test quality of intra-cell bridging faults. On average, 0.36% and 0.47% increases in fault coverage can be obtained for 1-time-frame and 2-time-frame CA tests, respectively. In addition to short defects between two metal polygons, short defects among three metal polygons are also considered in our methodology for another 9.33% improvement in fault coverage. |
16:44 CET | 13.1.2 | APUF FAULTS: IMPACT, TESTING, AND DIAGNOSIS Speaker: Wenjing Rao, University of Illinois Chicago, US Authors: Natasha Devroye, Vincent Dumoulin, Tim Fox, Wenjing Rao and Yeqi Wei, University of Illinois at Chicago, US Abstract Arbiter Physically Unclonable Functions (APUFs) are hardware security primitives that exploit manufacturing randomness to generate unique digital fingerprints for ICs. This paper theoretically and numerically examines the impact of faults native to APUFs -- mask parameter faults from the design phase, or process variation (PV) during the manufacturing phase. We model them statistically, and explain quantitatively how these faults affect the resulting PUF bias and uniqueness. When given access to only a single PUF instance, we focus on abnormal delta elements that are outliers in magnitude, as this is how the statistically modeled faults manifest at the individual level. To detect such bad PUF instances and diagnose the abnormal delta elements, we propose a testing methodology which partitions a random set of challenges so that a specific delta element can be targeted, forming a perceivable bias in the responses over these sets. This low-cost approach is highly effective in detecting and diagnosing bad PUFs with abnormal delta element(s). |
16:48 CET | 13.1.3 | GRAPH NEURAL NETWORK-BASED DELAY-FAULT LOCALIZATION FOR MONOLITHIC 3D ICS Speaker: Shao-Chun Hung, Department of Electrical and Computer Engineering, Duke University, US Authors: Shao-Chun Hung, Sanmitra Banerjee, Arjun Chaudhuri and Krishnendu Chakrabarty, Duke University, US Abstract Monolithic 3D (M3D) integration is a promising technology for achieving high performance and low power consumption. However, the limitations of current M3D fabrication flows lead to performance degradation of devices in the top tier and unreliable interconnects between tiers. Fault localization at the tier level is therefore necessary to enhance yield learning, For example, tier-level localization can enable targeted diagnosis and process optimization efforts. In this paper, we develop a graph neural network-based diagnosis framework to efficiently localize faults to a device tier. The proposed framework can be used to provide rapid feedback to the foundry and help enhance the quality of diagnosis reports generated by commercial tools. Results for four M3D benchmarks, with and without response compaction, show that the proposed solution achieves up to 39.19% improvement in diagnostic resolution with less than 1% loss of accuracy, compared to results from commercial tools. |
16:52 CET | 13.1.4 | A COMPACTION METHOD FOR STLS FOR GPU IN-FIELD TEST Speaker: Juan David Guerrero Balaguera, Politecnico di Torino, IT Authors: Juan Guerrero Balaguera, Josie Rodriguez Condia and Matteo Sonza Reorda, Politecnico di Torino, IT Abstract Nowadays, Graphics Processing Units (GPUs) are effective platforms for implementing complex algorithms (e.g., for Artificial Intelligence) in different domains (e.g., automotive and robotics), where massive parallelism and high computational effort are required. In some domains, strict safety-critical requirements exist, mandating the adoption of mechanisms to detect faults during the operational phases of a device. An effective test solution is based on Self-Test Libraries (STLs) aiming at testing devices functionally. This solution is frequently adopted for CPUs, but can also be used with GPUs. Nevertheless, the in-field constraints restrict the size and duration of acceptable STLs. This work proposes a method to automatically compact the test programs of a given STL targeting GPUs. The proposed method combines a multi-level abstraction analysis resorting to logic simulation to extract the microarchitectural operations triggered by the test program and the information about the thread-level activity of each instruction and to fault simulation to know its ability to propagate faults to an observable point. The main advantage of the proposed method is that it requires a single fault simulation to perform the compaction. The effectiveness of the proposed approach was evaluated, resorting to several test programs developed for an open-source GPU model (FlexGripPlus) compatible with NVIDIA GPUs. The results show that the method can compact test programs by up to 98.64% in code size and by up to 98.42% in terms of duration, with minimum effects on the achieved fault coverage. |
16:56 CET | 13.1.5 | Q&A SESSION Authors: Melanie Schillinsky1 and Riccardo Cantoro2 1NXP Germany GmbH, DE; 2Politecnico di Torino, IT Abstract Questions and answers with the authors |
13.2 From system-level specification to RTL and back
Add this session to my calendar
Date: Thursday, 17 March 2022
Time: 16:40 CET - 17:20 CET
Session chair:
Andy Pimentel, University of Amsterdam, NL
Session co-chair:
Matthias Jung, Fraunhofer IESE, DE
This session highlights the importance of system modeling for an efficient design. The first three papers showcase solutions to generate system level models from RTL descriptions and back. Finally, the last paper presents a cost-sensitive model and learning engine for disk failure prediction, to reduce misclassification costs while maintaining a high fault detection rate.
Time | Label | Presentation Title Authors |
---|---|---|
16:40 CET | 13.2.1 | AUTOMATIC GENERATION OF ARCHITECTURE-LEVEL MODELS FROM RTL DESIGNS FOR PROCESSORS AND ACCELERATORS Speaker: Yu Zeng, Princeton University, US Authors: Yu Zeng, Aarti Gupta and Sharad Malik, Princeton University, US Abstract Hardware platforms comprise general-purpose processors and application-specific accelerators. Unlike processors, application-specific accelerators often do not have clearly specified architecture-level models/specifications (the instruction set architecture or ISA). This poses challenges to the development and verification/validation of firmware/software for these accelerators. Manually writing architecture-level models takes great effort and is error-prone. When Register-Transfer Level (RTL) designs are available, they can be a source from which to automatically derive the architecture-level models. In this work, we propose an approach for automatically generating architecture-level models for processors as well as accelerators from their RTL designs. In previous work, we showed how to automatically extract the architectural state variables (ASVs) from RTL designs. (These are the state variables that are persistent across instructions.) In this work, we present an algorithm for generating the update functions of the model: how the ASVs and outputs are updated by each instruction. Experiments on several processors and accelerators demonstrate that our approach can cover a wide range of hardware features and generate high- quality architecture-level models within reasonable computing time. |
16:44 CET | 13.2.2 | TWINE: A CHISEL EXTENSION FOR COMPONENT-LEVEL HETEROGENEOUS DESIGN Speaker: Shibo Chen, University of Michigan, US Authors: Shibo Chen, Yonathan Fisseha, Jean-Baptiste Jeannin and Todd Austin, University of Michigan, US Abstract Algorithm-oriented heterogeneous hardware design has been one of the major driving forces for hardware improvement in the post-Moore's Law era. To achieve the swift development of heterogeneous designs, designers reuse existing hardware components to craft their systems. However, current hardware design languages either require tremendous efforts to customize designs, or sacrifice quality for simplicity. Chisel, while attracting more users for its capability to easily reconfigure designs, lacks a few key features to further expedite the heterogeneous design flow. In this paper, we introduce Twine—a Chisel extension that provides high-level semantics to efficiently generate heterogeneous designs. Twine standardizes the interface for better reusability and supports control-free specification with flexible data type conversion, which saves designers from the busy-work of interconnecting modules. Our results show that Twine provides a smooth on-boarding experience for hardware designers, considerably improves reusability, and reduces design complexity for heterogeneous designs while maintaining high design quality. |
16:48 CET | 13.2.3 | TOWARDS IMPLEMENTING RTL MICROPROCESSOR AGILE DESIGN USING FEATURE ORIENTED PROGRAMMING Speaker: Tun Li, National University of Defense Technology, CN Authors: Hongji Zou, Mingchuan Shi, Tun Li and Wanxia Qu, National University of Defense Technology, CN Abstract Recently, hardware agile design methods have been developed to improve the design productivity. However, the modeling methods hinder further design productivity improvements. In this paper, we propose and implement a microprocessor agile design method using feature oriented programming technology to improve design productivity. In this method, designs could be uniquely partitioned and constructed incrementally to explore various functional design features flexibly and efficiently. The key techniques to improve design productivity are flexible modeling extension and on-the-fly feature composing mechanisms. The evaluations on RISC-V and OR1200 CPU pipelines show the effectiveness of the proposed method on duplicate codes reduction and flexible feature composing while avoiding design resource overheads. |
16:52 CET | 13.2.4 | CSLE: A COST-SENSITIVE LEARNING ENGINE FOR DISK FAILURE PREDICTION IN LARGE DATA CENTERS Speaker: Xinyan Zhang, Huazhong University of Science and Technology, CN Authors: Xinyan Zhang1, Kai Shan2, Zhipeng Tan3 and Dan Feng3 1Wuhan National Laboratory for Optoelectronics, Huazhong University of Science & Technology, CN; 2Huawei Technologies, CN; 3Huazhong University of Science and Technology, CN Abstract As the principal failure in data centers, disk failure may pose the risk of data loss, increase the maintenance cost, and affect system availability. As a proactive fault tolerance technology, disk failure prediction can minimize the loss before failure occurs. Whereas, a weak prediction model with a low Failure Detection Rate (FDR) and high False Alarm Rate (FAR) may substantially increase the system cost due to inadequate consideration or misperception of the misclassification cost. To address these challenges, we propose a cost-sensitive learning engine CSLE for disk failure prediction, which combines a two-phase feature selection based on Cohen’s D and Genetic Algorithm, a meta-algorithm based on cost-sensitive learning, and an adaptive optimal classifier for heterogeneous and homogeneous disk series. Experimental results on real datasets show that the AUC of CSLE is increased by 2%-42% compared with the commonly used rank-sum test. CSLE can reduce the misclassification cost by 52%-96% compared with the rank model. Besides, CSLE has a better pervasiveness than the traditional prediction model, it can reduce both the misclassification cost and the FAR by 16%-70% for heterogeneous disk series, and increase the FDR by 3%-29% for homogeneous disk series. |
16:56 CET | 13.2.5 | Q&A SESSION Authors: Andy Pimentel1 and Matthias Jung2 1University of Amsterdam, NL; 2Fraunhofer IESE, DE Abstract Questions and answers with the authors |
13.3 Advances in permanent storage efficiency and NN-in-memory
Add this session to my calendar
Date: Thursday, 17 March 2022
Time: 16:40 CET - 17:20 CET
Session chair:
Yi Wang, Shenzhen University, CN
Session co-chair:
Zili Shao, The Chinese University of Hong Kong, HK
In this session we present several hardware- and software-based advances in permanent storage. The solutions are based on several technologies, like emerging persistent memories, flash and shingled magnetic disks to improve overall bandwidth, latency, capacity and resilience of permanent storage solutions. They do this by analyzing the current bottlenecks and associating several of these technologies to increase the performance at the overall system level, by developing a new framework for revisiting FTL firmware organization for future open-source multicore architectures, and presenting a robust implementation of binary neural networks addressing computing-in-memory.
Time | Label | Presentation Title Authors |
---|---|---|
16:40 CET | 13.3.1 | ROBUST BINARY NEURAL NETWORK AGAINST NOISY ANALOG COMPUTATION Speaker: Zong-Han Lee, National Tsing-Hua University, TW Authors: Zong-Han Lee1, Fu-Cheng Tsai2 and Shih-Chieh Chang1 1National Tsing-Hua University, TW; 2Industrial Technology Research Institute, TW Abstract Computing in memory (CIM) technology has shown promising results in reducing the energy consumption of a battery-powered device. On the other hand, to reduce MAC operations, Binary neural networks (BNN) show the potential to catch up with a full-precision model. This paper proposes a robust BNN model applied to the CIM framework, which can tolerate analog noises. These analog noises caused by various variations, such as process variation, can lead to low inference accuracy. We first observe that the traditional batch normalization can cause a BNN model to be susceptible to analog noise. We then propose a new approach to replace the batch normalization while maintaining the advantages. Secondly, in BNN, since noises can be removed when inputs are zeros during the multiplication and accumulation (MAC) operation, we also propose novel methods to increase the number of zeros in a convolution output. We apply our new BNN model in the keyword spotting application. Our results are very exciting. |
16:44 CET | 13.3.2 | (Best Paper Award Candidate) MU-RMW: MINIMIZING UNNECESSARY RMW OPERATIONS IN THE EMBEDDED FLASH WITH SMR DISK Speaker: Chenlin Ma, Shenzhen University, CN Authors: Chenlin Ma, Zhuokai Zhou, Yingping Wang, Yi Wang and Rui Mao, Shenzhen University, CN Abstract Emerging Shingled Magnetic Recording (SMR) Disk can improve the storage capacity significantly by overlapping multiple tracks with the shingled direction. However, the shingled-like structure leads to severe write amplification caused by RMW operations inner SMR disks. As the mainstream solid-state storage technology, NAND flash has the advantages of tiny size, cost-effective, high performance, making it suitable and promising to be incorporated into SMR disks to boost the system performance. In this hybrid embedded storage system (i.e., the Embedded Flash with SMR disk (EF-SMR) system), we observe that physical flash blocks can contain a mixture of data associated with different SMR data bands; when garbage collecting such flash blocks, multiple RMW operations are triggered to rewrite the involved SMR bands and the performance is further exacerbated. Therefore, in this paper, we for the first time present MU-RMW to guarantee data from different SMR bands will not be mixed up within the flash blocks with an aim at minimizing unnecessary RMW operations. The effectiveness of MU-RMW was evaluated with realistic and intensive I/O workloads and the results are encouraging. |
16:48 CET | 13.3.3 | OPTIMIZING COW-BASED FILE SYSTEMS ON OPEN-CHANNEL SSDS WITH PERSISTENT MEMORY Speaker: Runyu Zhang, Chongqing University, CN Authors: Runyu Zhang1, Duo Liu2, Chaoshu Yang3, Xianzhang Chen2, Lei Qiao4 and Yujuan Tan2 1College of Computer Science, Chongqing University, CN; 2Chongqing University, CN; 3Guizhou University, CN; 4Beijing Institute of Control Engineering, CN Abstract Block-based file systems, such as Btrfs, utilize the copy-on-write (CoW) mechanism to guarantee data consistency on solid-state drives (SSDs). Open-channel SSD provides opportunities for in-depth optimization of block-based file systems. However, existing systems fail to co-design the two-layer semantics and cannot take full advantage of the open-channel characteristics. Specifically, synchronizing an overwrite in Btrfs will copy-on-write all pages in the update path and induce severe write amplification. In this paper, we propose a hybrid fine-grained copy-on-write and journaling mechanism (HyFiM) to address these problems. We first utilize persistent memories to preserve the address mapping table of open-channel SSD. Then, we design an intra-FTL copy-on-write mechanism (IFCoW) that eliminates the recursive updates caused by overwrites. Finally, we devise fine-grained metadata journals (FGMJ) to guarantee the consistency of metadata with minimum overhead. We prototype HyFiM based on Btrfs in the Linux kernel. Comprehensive evaluations demonstrate that HyFiM can outperform over Btrfs by 30.77% and 33.82% for sequential and random overwrites, respectively. |
16:52 CET | 13.3.4 | MCMQ: SIMULATION FRAMEWORK FOR SCALABLE MULTI-CORE FLASH FIRMWARE OF MULTI-QUEUE SSDS Speaker: Jin Xue, The Chinese University of Hong Kong, HK Authors: Jin Xue, Tianyu Wang and Zili Shao, The Chinese University of Hong Kong, HK Abstract Solid-state drives (SSDs) have been used in a wide range of emerging data processing systems. To fully utilize the massive internal parallelism delivered by SSDs,manufacturers begin to utilize high-performance multi-core microprocessors in scalable flash firmware to process I/O requests concurrently. Designing scalable multi-core flash firmwares requires simulation tools that can model the features of a multi-core environment. However, existing SSD simulators assume a single-threading execution model and are not capable of modelling overheads incurred by multi-threading firmware execution such as lock contentions. In this paper, we propose MCMQ, a novel framework for simulating scalable multi-core flash firmware. The framework is based on an emulated multi-core RISC processor and supports executing multiple I/O traces in parallel through a multi-queue interface. Experiment results show the effectiveness of the proposed framework. We have released the open-source code of MCMQ for public access. |
16:56 CET | 13.3.5 | Q&A SESSION Authors: Yi Wang1 and Zili Shao2 1Shenzhen University, CN; 2The Chinese University of Hong Kong, HK Abstract Questions and answers with the authors |
13.4 System-level security
Add this session to my calendar
Date: Thursday, 17 March 2022
Time: 16:40 CET - 17:20 CET
Session chair:
Pascal Benoit, University of Montpellier, FR
Session co-chair:
Mike Hamburg, Cryptography Research, US
The session focuses on security from a high-level perspective: improvements of Intel Software Guard Extensions, one that ensures that pages are available in the secure memory when needed, and another one that extends the existing secure Key-Value Store, new protections against transient execution attacks and fault injection attack, and a new dynamic attack that can evade hardware-assisted attack/intrusion detection will be exposed.
Time | Label | Presentation Title Authors |
---|---|---|
16:40 CET | 13.4.1 | CR-SPECTRE: DEFENSE-AWARE ROP INJECTED CODE-REUSE BASED DYNAMIC SPECTRE Speaker: Abhijitt Dhavlle, George Mason University, US Authors: Abhijitt Dhavlle1, Setareh Rafatirad2, Houman Homayoun2 and Sai Manoj Pudukotai Dinakarrao3 1George Mason University , VA, USA, US; 2University of California Davis, US; 3George Mason University, US Abstract Side-channel attacks have been a constant threat to computing systems. In recent times, vulnerabilities in the architecture were discovered and exploited to mount and execute a state-of-the-art attack such as Spectre. The Spectre attack exploits a vulnerability in the Intel-based processors to leak confidential data through the covert channel. There exist some defenses to mitigate the Spectre attack. Among multiple defenses, hardware-assisted attack/intrusion detection (HID) systems have received overwhelming response due to its low overhead and efficient attack detection. The HID systems deploy machine learning (ML) classifiers to perform anomaly detection to determine whether the system is under attack. For this purpose, a performance monitoring tool profiles the applications to record hardware performance counters (HPC), which performs anomaly detection. Previous HID systems assume that the Spectre is executed as a standalone application. In contrast, we propose an attack that dynamically generates variations in the injected code to evade detection. The attack is injected into a benign application. In this manner, the attack conceals itself as a benign application and generates perturbations to avoid detection. For the attack injection, we exploit a return-oriented programming (ROP)-based code-injection technique that reuses the code, called gadgets, present in the exploited victim's (host) memory to execute the attack, which, in our case, is the CR-Spectre attack to steal sensitive data from a target victim (target) application. Our work focuses on proposing a dynamic attack that can evade HID detection by injecting perturbations, and its dynamically generated variations thereof, under the cloak of a benign application. We evaluate the proposed attack on the MiBench suite as the host. From our experiments, the HID performance degrades from 90% to 16%, indicating our Spectre-CR attack avoids detection successfully. |
16:44 CET | 13.4.2 | CACHEREWINDER: REVOKING SPECULATIVE CACHE UPDATES EXPLOITING WRITE-BACK BUFFER Speaker: Jongmin Lee, Korea University, KR Authors: Jongmin Lee1, Junyeon Lee2, Taeweon Suh1 and Gunjae Koo1 1Korea University, KR; 2Samsung Advanced Institute of Technology, KR Abstract Transient execution attacks are critical security threats since those attacks exploit speculative execution which is an essential architectural solution that can improve the performance of out-of-order processors significantly. Such attacks change cache state by accessing secret data during speculative executions, then the attackers leak the secret information exploiting cache timing side-channels. Even though software patches against transient execution attacks have been proposed, the software solutions significantly slow down the performance of a system. In this paper, we propose CacheRewinder, an efficient hardware-based defense mechanism against transient execution attacks. CacheRewinder prevents leakage of secret information by revoking the cache updates done by speculative executions. To restore the cache state efficiently, CacheRewinder exploits the underutilized write-back buffer space as the temporary storage for victimized cache blocks that are evicted during speculative executions. Hence when speculation fails CacheRewinder can quickly restore the cache state using the evicted cache blocks held in the write-back buffer. Our evaluation exhibits that CacheRewinder can effectively defend the transient execution attacks. The performance overhead by CacheRewinder is only 0.6%, which is negligible compared to the unprotected baseline processor. CacheRewinder also requires minimal storage cost since it exploits unused write-back buffer entries as storage for evicted cache blocks. |
16:48 CET | 13.4.3 | SAFETEE: COMBINING SAFETY AND SECURITY ON ARM-BASED MICROCONTROLLERS Speaker: Martin Schönstedt, TU Darmstadt, DE Authors: Martin Schönstedt, Ferdinand Brasser, Patrick Jauernig, Emmanuel Stapf and Ahmad-Reza Sadeghi, TU Darmstadt, DE Abstract From industry automation to smart home, embedded devices are already ubiquitous, and the number of applications continues to grow rapidly. However, the plethora of embedded devices used in these systems leads to considerable hardware and maintenance costs. To reduce these costs, it is necessary to consolidate applications and functionalities that are currently implemented on individual embedded devices. Especially in mixed-criticality systems, consolidating applications on a single device is highly challenging and requires strong isolation to ensure the security and safety of each application. Existing isolation solutions, such as partitioning designs for ARM-based microcontrollers, do not meet these requirements. In this paper, we present SafeTEE, a novel approach to enable security- and safety-critical applications on a single embedded device. We leverage hardware mechanisms of commercially available ARM-based microcontrollers to strongly isolate applications on individual cores. This makes SafeTEE the first solution to provide strong isolation for multiple applications in terms of security as well as safety. We thoroughly evaluate our prototype of SafeTEE for the most recent ARM microcontrollers using a standard microcontroller benchmark suite. |
16:52 CET | 13.4.4 | Q&A SESSION Authors: Pascal Benoit1 and Mike Hamburg2 1University of Montpellier, FR; 2Cryptography Research, US Abstract Questions and answers with the authors |
13.5 Safe and Efficient Engineering of Autonomous Systems
Add this session to my calendar
Date: Thursday, 17 March 2022
Time: 16:40 CET - 17:20 CET
Session chair:
Sebastian Steinhorst, TU Munich, DE
Session co-chair:
Sharon Hu, University of Notre Dame, US
This session discusses novel approaches for engineering autonomous systems considering safety and validation aspects as well as efficiency. The first paper uses an ontology-based perception for autonomous vehicles which enables a comprehensive safety analysis, the second paper relies on formal approaches for generating relevant critical scenarios for automated driving. The last paper proposes an efficient method for recharging unmanned Aerial Vehicles (UAVs) to perform a large-scale remote sensing with maximal coverage.
Time | Label | Presentation Title Authors |
---|---|---|
16:40 CET | 13.5.1 | USING ONTOLOGIES FOR DATASET ENGINEERING IN AUTOMOTIVE AI APPLICATIONS Speaker: Martin Herrmann, Robert Bosch GmbH, DE Authors: Martin Herrmann1, Christian Witt2, Laureen Lake1, Stefani Guneshka3, Christian Heinzemann1, Frank Bonarens4, Patrick Feifel4 and Simon Funke5 1Robert Bosch GmbH, DE; 2Valeo Schalter und Sensoren GmbH,, DE; 3Understand AI, DE; 4Stellantis, Opel Automobile GmbH, DE; 5Understand AI,, DE Abstract Basis of a robust safety strategy for an automated driving function based on neural networks is a detailed description of its input domain, i.e. a description of the environment, in which the function is used. This is required to describe its functional system boundaries and to perform a comprehensive safety analysis. Moreover, it allows to tailor datasets specifically designed for safety related validation tests. Ontologies fulfill the task to gather expert knowledge and model information to enable computer aided processing, while using a notion understandable for humans. In this contribution, we propose a methodology for domain analysis to build up an ontology for perception of autonomous vehicles including characteristic features that become important when dealing with neural networks. Additionally, the method is demonstrated by the creation of a synthetic test dataset for an Euro NCAP-like use case. |
16:53 CET | 13.5.2 | USING FORMAL CONFORMANCE TESTING TO GENERATE SCENARIOS FOR AUTONOMOUS VEHICLES Speaker: Lucie Muller, INRIA, FR Authors: Jean-Baptiste Horel1, Christian Laugier1, Lina Marsso2, Radu Mateescu3, Lucie Muller3, Anshul Paigwar1, Alessandro Renzaglia1 and Wendelin Serwe3 1University Grenoble Alpes, Inria, FR; 2University of Toronto, CA; 3INRIA, FR Abstract Simulation, a common practice to evaluate autonomous vehicles, requires to specify realistic scenarios, in particular critical ones, which correspond to corner-case situations occurring rarely and potentially dangerous to reproduce in real environments. Such simulation scenarios may be either generated randomly, or specified manually. Randomly generated scenarios can be easily generated, but their relevance might be difficult to assess, for instance when many slightly different scenarios target one feature. Manually specified scenarios can focus on a given feature, but their design might be difficult and time-consuming, especially to achieve satisfactory coverage. In this work, we propose an automatic approach to generate a large number of relevant critical scenarios for autonomous driving simulators. The approach is based on the generation of behavioural conformance tests from a formal model (specifying the ground truth configuration with the range of vehicle behaviours) and a test purpose (specifying the critical feature to focus on). The obtained abstract test cases cover, by construction, all possible executions exercising a given feature, and can be automatically translated into the inputs of autonomous driving simulators. We illustrate our approach by generating hundreds of behaviour trees for the CARLA simulator for several realistic configurations. |
17:06 CET | 13.5.3 | REMOTE SENSING WITH UAV AND MOBILE RECHARGING VEHICLE RENDEZVOUS Speaker: Michael Ostertag, University of California, San Diego, US Authors: Michael Ostertag1, Jason Ma1 and Tajana S. Rosing2 1University of California, San Diego, US; 2UCSD, US Abstract Small unmanned aerial vehicles (UAVs) equipped with sensors offer an effective way to perform high-resolution environmental monitoring in remote areas but suffer from limited battery life. In order to perform large-scale remote sensing, a UAV must cover the area using multiple discharge cycles. A practical and efficient method to achieve full coverage is for the sensing UAV to rendezvous with a mobile recharge vehicle (MRV) for a battery exchange, which is an NP-hard problem. Existing works tackle this problem using slow genetic algorithms or greedy heuristics. We propose an alternative approach: a two-stage algorithm that iterates between dividing a region into independent subregions aligned to MRV travel and a new Diffusion Heuristic that performs a local exchange of points of interest between neighboring subregions. The algorithm outperforms existing state-of-the-art planners for remote sensing applications, creating more fuel efficient paths that better align with MRV travel. |
A.1 Panel on Quantum and Neuromorphic Computing: Designing Brain-Inspired Chips
Add this session to my calendar
Date: Thursday, 17 March 2022
Time: 17:30 CET - 19:00 CET
Session chair:
Aida Todri Sanial, LIRMM, FR
Session co-chair:
Anne Matsuura, Intel, US
Panellists:
Bhavin J. Shastri, Queen’s University, CA
Giacomo Indiveri, ETH Zürich, CH
Mike Davies, INTEL, US
In this session, invited speakers from industry and academia will cover aspects from neuro-inspired computing chips, neuromorphic engineering, photonics to organic electronics for neuromorphic computing.
14.1 University Fair
Add this session to my calendar
Date: Thursday, 17 March 2022
Time: 19:00 CET - 20:30 CET
Session chair:
Ioannis Sourdis, Chalmers, SE
Session co-chair:
Nele Mentens, KU Leuven, BE
The University Fair is a forum for disseminating academic research activities. Its goal is twofold:
(1) to foster the transfer of mature academic work to a large audience of industrial parties.
(2) to advertise new or upcoming research plans associated with new open research positions to a large audience of graduate students.
To this end, the University Fair program includes talks that describe (1) pre-commercial mature academic research results and/or prototypes with technology transfer potential as well as (2) new upcoming research initiatives associated with openings of academic research positions.
Time | Label | Presentation Title Authors |
---|---|---|
19:00 CET | 14.1.1 | CHALMERS ACTIVITIES IN EUROHPC JU Speaker and Author: Per Stenstrom, Chalmers University of Technology, SE Abstract . |
19:10 CET | 14.1.2 | HARDWARE DESIGNS FOR HIGH PERFORMANCE AND RELIABLE SPACE PROCESSORS Authors: Leonidas Kosmidis and Marc Solé Bonet, Universitat Politècnica de Catalunya - Barcelona Supercomputing Center, ES Abstract . |
19:20 CET | 14.1.3 | NEW POSITION IN THE SSH TEAM OF TéLéCOM PARIS Speaker and Author: Jean Luc Danger, Télécom ParisTech, FR Abstract . |
19:30 CET | 14.1.4 | A TOOLCHAIN FOR LIBRARY CELL CHARACTERIZATION FOR RFET TECHNOLOGIES Speaker: Steffen Märcker, TU Dresden, DE Authors: Steffen Märcker, Akash Kumar, Michael Raitza and Shubham Rai, TU Dresden, DE Abstract . |
19:40 CET | 14.1.5 | SAFETY-RELATED OPEN SOURCE HARDWARE MODULES Speaker: Jaume Abella, Barcelona Supercomputing Center, ES Authors: Jaume Abella1, Sergi Alcaide2 and Pedro Benedicte1 1Barcelona Supercomputing Center, ES; 2Universitat Politècnica de Catalunya - Barcelona Supercomputing Center, ES Abstract . |
19:50 CET | 14.1.6 | RESEARCH @NECSTLAB IN A NUTSHELL AKA RESEARCH ACTIVITIES AND OPPORTUNITIES FOR PROSPECTIVE PHD STUDENTS Speaker and Author: Marco D. Santambrogio, Politecnico di Milano, IT Abstract . |
20:00 CET | 14.1.7 | POWER-OFF LASER ATTACKS ON SECURITY PRIMITIVES Speaker and Author: Giorgio Di Natale, TIMA, FR Abstract . |
W01 European Automotive Reliability, Test and Safety (eARTS)
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 08:00 CET - 18:00 CET
Organisers:
Riccardo Cantoro, Politecnico di Torino, IT
Giusy Iaria, Politecnico di Torino, IT
Speakers:
Alberto Bosio, École Centrale de Lyon, FR
Fabien Bouquillon , Université de Lille, FR
Marc Hutner , proteanTecs, CA
Sarah Seifi, Infineon Technologies AG, DE
Haralampos-G. Stratigopoulos, Sorbonne Universités, CNRS, LIP6, FR
Lee Harrison, Siemens EDA, GB
Paolo Bernardi, Politecnico di Torino, IT
Teo Cupaiuolo, Synopsis, IT
Lieyi Sheng, onsemi, BE
Shivakumar Chonnad , Synopsys Inc, US
Alessandra Nardi, Cadence, US
Giovanni Corrente, STMicroelectronics, IT
Lee Harrison, Siemens EDA, GB
Tobias Kilian, TU Munich / Infineon Technologies AG, DE
Haolan Liu, University of California, US
Adrian Evans, CEA-Leti, FR
Mayukh Bhattacharya , Synopsys, US
Yousef Abdulhammed, BMW Motorsport, DE
Information
Automotive electronics is becoming more and more relevant in the daily life, especially with the advent of the autonomous driving. People will become 100% dependent on the proper operation of the electronic systems. The 2nd European Automotive Reliability, Test and Safety workshop (eARTS) intends to focus on test, reliability, and safety of automotive electronics, including IC design, test development, system-level integration, production testing, in-field test, diagnosis and repair solutions, cybersecurity, as well as architectures and methods for reliable, safe, and secure operation in the field.
The eARTS Workshop wants to offer a forum for industry specialists and academic researchers to present and discuss these challenges and emerging solutions. For this second edition in the frame of the DATE conference, special focus will be given on Design-for-Test solutions and System level test.
Topic Areas – You are invited to participate and submit your contributions to the eARTS Workshop. The workshop’s areas of interest include (but are not limited to) the following topics:
- Automotive Design-for-Test: enable high quality at low cost
- Statistical post-processing, Machine Learning, and AI for test and reliability
- Latent defect activation during production testing
- Built-In Self-Test in automotive systems: digital, analog, mixed-signal
- Reuse of test infrastructure and New Product Development acceleration
- Dependability challenges of autonomous driving and e-mobility
- Functional safety and cyber-security
- Automotive standards and certification – ISO 26262, AEC-Q100
- Approximate computing for automotive
- Verification and validation of automotive systems
- Fault tolerance and self-checking circuits
- Aging effects on automotive electronics
- Power-up, power-down and periodic test
- System level test
- Functional and structural test generation
- Automotive production testing
Key dates
- Submission deadline : January 27, 2022
- Notification of acceptance : February 15, 2022
Program
Morning sessions |
|
---|---|
8:30 - 8:40 CET | Opening |
8:40 - 9:40 CET | Keynote: From combustion towards electrical cars |
9:40 - 10:00 CET | Break |
10:00 - 11:00 CET | Technical Session 1 |
11:00 - 12:00 CET |
Invited Session 1: Design-for-dependability for AI hardware accelerators in the edge |
12:00 - 13:00 CET | Technical Session 2 |
13:00 - 14:30 CET | Lunch break |
Afternoon sessions |
|
---|---|
14:30 - 15:30 CET | Technical Session 3 |
15:30 - 16:10 CET | Embedded Tutorial: IEEE P2851 advancements |
16:10 - 16:30 CET | Break |
16:30 - 17:30 CET | Invited Session 2: The challenges of reaching zero defect and functional safety – and how the EDA industry tackles them |
17:30 - 18:30 CET | Panel: What are the limitations of EDA tools with respect to zero defects and FuSa? |
18:30 - 18:45 CET | Closing |
W01.0 Opening
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 08:30 CET - 08:40 CET
Chairs:
Paolo Bernardi, Politecnico di Torino, IT
Yervant Zorian, Synopsys, US
Riccardo Cantoro, Politecnico di Torino, IT
Wim Dobbelaere, onsemi, BE
W01.1 Keynote: From combustion towards electrical cars
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 08:40 CET - 09:40 CET
Keynote Speaker:
Riccardo Groppo, Ideas and Motion, IT
Short bio: Riccardo Groppo took his MSc degree in Electronic Engineering at the Politecnico of Torino (Torino, Italy). He is the co-founder and CEO of Ideas & Motion, a high-tech company focused on IP development on silicon and design of complex automotive control systems for niche applications. He is the Chairman of Transportation Working Group and Board Vice-Chairman within EPoSS (European Platform on Smart Systems Integration). He is Member of the Technical Committee in some relevant events worldwide (SAE World Congress, AMAA Conference and Smart System Integration Conference). He started his career with Honeywell Bull and then joined Centro Ricerche FIAT (CRF) in 1989, where he was involved in the design of innovative engine/vehicle automotive control systems. He was a member of the CRF team who developed the first automotive Common Rail system for a direct injection Diesel engine. Then he was involved in the design and industrialization of the MultiAir Technology and the Dry Dual Clutch transmission. He has been the Head of the Automotive Electronics Design and Development Dept. at CRF (2002-2013), where he promoted the design of IP building blocks by means of ASIC technology in cooperation with Freescale Semiconductors and Robert BOSCH for the FIAT/Chrysler applications. Those smart drivers are the standard de-facto in automotive powertrain applications, with volumes exceeding 17 Million parts/year. He holds more 31 patents in the field of automotive electronics and embedded systems, most of which are currently in production on passenger cars.
W01.T1 Technical Session 1 - Applications, Machine Learning, and System-level Test
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 10:00 CET - 11:00 CET
Time | Label | Presentation Title Authors |
---|---|---|
10:00 CET | W01.T1.1 | TOWARDS FAST AND EFFICIENT SCENARIO GENERATION FOR AUTONOMOUS VEHICLES Speaker: Haolan Liu, University of California, US |
10:15 CET | W01.T1.2 | DEEP LEARNING BASED DRIVER MODEL AND FAULT DETECTION FOR AUTOMATED RACECAR SYSTEM TESTING Speaker: Yousef Abdulhammed, BMW Motorsport, DE |
10:30 CET | W01.T1.3 | UNSUPERVISED CLUSTERING OF ACOUSTIC EMISSION SIGNALS FOR SEMICONDUCTOR THIN LAYER CRACK DETECTION AND DAMAGE EVENT INTERPRETATION Speaker: Sarah Seifi, Infineon Technologies AG, DE |
10:45 CET | W01.T1.4 | ONLINE SCHEDULING OF MEMORY BISTS EXECUTION AT REAL-TIME OPERATING-SYSTEM LEVEL Speaker: Paolo Bernardi, Politecnico di Torino, IT |
W01.2 Invited Session 1: Design-for-dependability for AI hardware accelerators in the edge
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 11:00 CET - 12:00 CET
Organiser:
Haralampos-G. Stratigopoulos, Sorbonne Universités, CNRS, LIP6, FR
Abstract: AI has seen an explosion in real-world applications in the recent years. For example, it is the backbone of self-driving and connected cars. The design of AI hardware accelerators to support the intensive and memory-hungry AI workloads is an on-going effort aiming at optimizing the energy-area trade-off. This special session will focus on dependability aspects in the design of AI hardware accelerators. It is often tacitly assumed that neural networks on hardware inherit the remarkable fault tolerance capabilities of the biological brain. This assumption has proven to be false in recent years by a number of fault injection experiments. The three talks will cover reliability assessment and fault tolerance of Artificial Neural Networks and Spiking Neural Networks implemented in hardware, as well as the impact of approximate computing on the fault tolerance capabilities.
Presentations:
- Fault Tolerance of Neural Network Hardware Accelerators for Autonomous Driving
Adrian Evans (CEA-Leti, Grenoble, France), Lorena Anghel (Grenoble-INP, SPINTEC, Grenoble, France), and Stéphane Burel (CEA-Leti, Grenoble, France)
- Exploiting Approximate Computing for Efficient and Reliable Convolutional Neural Networks
Alberto Bosio (École Centrale de Lyon, INL, Lyon, France)
- Reliability Assessment and Fault Tolerance of Spiking Neural Network Hardware Accelerators
Haralampos-G. Stratigopoulos (Sorbonne University, CNRS, LIP6, Paris, France)
Time | Label | Presentation Title Authors |
---|---|---|
11:00 CET | W01.2.1 | FAULT TOLERANCE OF NEURAL NETWORK HARDWARE ACCELERATORS FOR AUTONOMOUS DRIVING Speaker: Adrian Evans, CEA-Leti, FR |
11:20 CET | W01.2.2 | EXPLOITING APPROXIMATE COMPUTING FOR EFFICIENT AND RELIABLE CONVOLUTIONAL NEURAL NETWORKS Speaker: Alberto Bosio, École Centrale de Lyon, FR |
11:40 CET | W01.2.3 | RELIABILITY ASSESSMENT AND FAULT TOLERANCE OF SPIKING NEURAL NETWORK HARDWARE ACCELERATORS Speaker: Haralampos-G. Stratigopoulos, Sorbonne Universités, CNRS, LIP6, FR |
W01.T2 Technical Session 2 - Testing
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 12:00 CET - 13:00 CET
Session Chair:
Melanie Schillinsky, NXP, DE
Time | Label | Presentation Title Authors |
---|---|---|
12:00 CET | W01.T2.1 | PERCEPTION AND REALITY CHECK INTO V-STRESS FOR SCREENING DEFECTIVE PARTS IN AUTOMOTIVE RELIABILITY Speaker: Lieyi Sheng, onsemi, BE |
12:15 CET | W01.T2.2 | POWER CYCLING BODY DIODE CURRENT FLOW ON SIC MOSFET DEVICE Speaker: Giovanni Corrente, STMicroelectronics, IT |
12:30 CET | W01.T2.3 | REDUCING ROUTING OVERHEAD USING NATURAL LOOPS Speaker: Tobias Kilian, TU Munich / Infineon Technologies AG, DE |
12:45 CET | W01.T2.4 | A NOVEL METHOD FOR DISCOVERING ELECTRICALLY EQUIVALENT DEFECTS IN ANALOG/MIXED-SIGNAL CIRCUITS Speaker: Mayukh Bhattacharya , Synopsys, US |
W01.T3 Technical Session 3 - Reliability and Safety
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 14:30 CET - 15:30 CET
Session Chair:
Michelangelo Grosso, STMicroelectronics, IT
Time | Label | Presentation Title Authors |
---|---|---|
14:30 CET | W01.T3.1 | IMPROVING INSTRUCTION CACHE MEMORY RELIABILITY UNDER REAL-TIME CONSTRAINTS Speaker: Fabien Bouquillon , Université de Lille, FR |
14:45 CET | W01.T3.2 | COMMON DATA LANGUAGE CONNECTING HTOL TESTING TO IN-FIELD USE Speaker: Marc Hutner , proteanTecs, CA |
15:00 CET | W01.T3.3 | EFFICIENT USE OF ON-LINE LOGICBIST TO ACHIEVE ASIL B IN A GPU IP Speaker: Lee Harrison, Siemens EDA, GB |
15:15 CET | W01.T3.4 | VERIFICATION AND VALIDATION OF SAFETY ELEMENT OUT OF CONTEXT Speaker: Shivakumar Chonnad , Synopsys Inc, US |
W01.ET Embedded Tutorial - IEEE P2851 advancements
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 15:30 CET - 16:10 CET
Session Chair:
Oscar Ballan , Ethernovia, US
Organiser:
Jyotika Athavale , NVIDIA, US
Speakers:
Bernhard Bauer, Synopsys, DK
Meirav Nitzan , Synopsys, US
W01.3 Invited Session 2: The challenges of reaching zero defect and functional safety – and how the EDA industry tackles them
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 16:30 CET - 17:30 CET
Session Chair:
Daniel Tille, Infineon, DE
Organisers:
Riccardo Cantoro, Politecnico di Torino, IT
Daniel Tille, Infineon, DE
Abstract: Automotive Microcontrollers have been becoming very complex System-on-Chips (SoCs). Especially the megatrends Assisted Driving (ADAS) and Automated Driving (AD), but also traditional applications such as power-train steering require ever-increasing functionality. However, these safety-critical environments require zero defect, and the implementation of functional safety measures and the rising complexity poses significant challenges to satisfy these requirements.This special session addresses these challenges and shows potential solutions to overcome them with the help of the EDA industry.
Presentations:
- Automated solutions for safety and security vulnerabilities
Teo Cupaiuolo (Synopsys)
- Functional Safety: an EDA perspective
Alessandra Nardi (Cadence)
- The Zero Defect Goal For Automotive ICs
Lee Harrison (Siemens EDA); Nilanjan Mukherjee (Siemens)
Time | Label | Presentation Title Authors |
---|---|---|
16:30 CET | W01.3.1 | AUTOMATED SOLUTIONS FOR SAFETY AND SECURITY VULNERABILITIES Speaker: Teo Cupaiuolo, Synopsis, IT |
16:50 CET | W01.3.2 | FUNCTIONAL SAFETY: AN EDA PERSPECTIVE Speaker: Alessandra Nardi, Cadence, US |
17:10 CET | W01.3.3 | THE ZERO DEFECT GOAL FOR AUTOMOTIVE ICS Speaker: Lee Harrison, Siemens EDA, GB |
W01.4 Panel: What are the limitations of EDA tools with respect to zero defects and FuSa?
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 17:30 CET - 18:30 CET
Session Chair:
Wim Dobbelaere, onsemi, BE
Organiser:
Davide Appello, STMicroelectronics, IT
Panellists:
Antonio Priore, ARM, GB
Georges Gielen, KU Leuven, BE
Chen He, NXP, US
Mauro Pipponzi, ELES, IT
Vladimir Zivkovic, Infineon, DK
Om Ranjan , STMicroelectronics, IN
Abstract: High quality demanding products segments like automotive, transportation, and aerospace have been characterized by persistent needs across several years:
- Zero defects, or in general very low defective levels
- Accurate modeling and prediction of product reliability
The sustainability of these objectives is challenged by the relentless demand of higher performances products and the consequent access to higher complexities and advanced technology nodes.
Functional safety standards and requirements aim to grant the usability of products in safety-critical applications and add several requirements whose satisfaction is a key criticality during the development of a new product.
The proposed panel session would like to debate with the experts about how much the available EDA tools are effectively helping to face the described challenges.
As an example, these are suitable questions that anyone in the field may need answering:
- How does EDA help effectively resolve “end-to-end” the traceability of defined requirements? Is this representing a sustainable effort?
- Is DFT effective enough in addressing fault models to reach target quality?
- Is verification/simulation/validation effective respect transient mode?
W01.5 Closing
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 18:30 CET - 18:45 CET
Chairs:
Paolo Bernardi, Politecnico di Torino, IT
Riccardo Cantoro, Politecnico di Torino, IT
Yervant Zorian, Synopsys, US
Wim Dobbelaere, onsemi, BE
W07 European Workshop on Silicon Lifecycle Management (eSLM)
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 08:00 CET - 18:00 CET
Organisers:
Mehdi Tahoori, Karlsruhe Institute of Technology, DE
Yervant Zorian, Synopsys, US
Aim and Scope
With increasing system complexity, security, stringent runtime requirements for functional safety, and cost constraints of a mass market, the reliable and secure operation of electronics in safety-critical, enterprise servers and cloud computing domains is still a major challenge. While traditionally design time and test time solutions were supposed to guarantee the in-field dependability and security of electronic systems, due to complex interaction of runtime effects from running workload and environment, there is a great need for a holistic approach for silicon lifecycle management, spanning from design time to in-field monitoring and adaptation. Therefore the solutions for lifecycle management should include various sensors and monitors embedded in different levels of the design stack, access mechanisms and standards for such on-chip and in-system sensor network, as well as data analytics on the edge and in the cloud. This European edition of the eSLM Workshop tries to build a community around this topic and offer a forum to present and discuss these challenges and emerging solutions among researchers and practitioners alike.
Topic Areas
The workshop’s areas of interest include (but are not limited to) the following topics:
- Design and placement of various sensors and monitors for functional safety and security
- Standards for sensor data aggregation
- Data analytics for sensor data processing
- Anomaly detection for security and functional safety
- Machine learning for in-field system health monitoring
- Multi-layer dependability evaluation
- In-field verification and validation
- Fault tolerance and self-checking circuits
- Aging effects on electronics
- Reuse and extension of test, debug and repair infrastructure for in-filed management
- Power-up, power-down and periodic tests
- System level test
- Preventive Maintenance
- Concurrent and periodic checking
- Functional and structural test generation
- Graceful degradation
- Useful remaining lifetime prediction
- Failure prediction and forecasting
- Attack prediction and prevention
- In-field configuration and adaptation
- Cross-layer solutions
Preliminary Program
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Panel Information
Panel: „Challenges of the SLM ecosystem“
Organizer: Hans-Joachim Wunderlich
Silicon lifecycle management covers a broad variety of aspects and goals which may complement each other but which can also be in conflict. Examples are runtime test and diagnosis versus security, data collection in the cloud versus privacy, BIST versus monitoring or on-chip infrastructure and reliability. Renowned experts mainly from industry will discuss various challenges of the different aspects of SLM:
Panelists:
- Dan Alexandrescu, IROC Technologies
- Jürgen Alt, Infineon
- Sonny Banwari, Advantest
- Artur Jutman, Testonica
- Martino Quattrocchi or Antonio Scrofani, ST
- Aileen Ryan, Siemens (Mentor)
- Mark Tehranipoor, FICS
W08 Workshop on Ferroelectronics
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 08:00 CET - 18:00 CET
Organisers:
Ian O'Connor, INL, FR
Stefan Slesazeck, NaMLab GmbH, DE
Bastien Giraud, CEA, FR
The Design, Automation, and Test in Europe conference and exhibition is the main European event bringing together researchers, vendors and specialists in hardware and software design, test and manufacturing of electronic circuits and systems. Friday Workshops are dedicated to emerging research and application topics. At DATE 2022, one of the Friday Workshops is devoted to the emerging field of ferroelectronics.
DESCRIPTION
Ferroelectric capacitors (FeCap) and field effect transistors (FeFETs) have attracted attention as next-generation devices as they can serve as synaptic devices for neuromorphic implementation as well as multi-level memories and non-volatile transistors to achieve high integration. The elegant, fundamental physics of ferroelectricity is a century old but the game-changer came with the discovery of hafnium–zirconium oxide (HZO) with high ferroelectricity (even at a thickness of several nanometers) that can be fabricated in a CMOS-compatible process, Ferroelectric devices are a versatile and energy-efficient approach that has immense potential to revolutionize machine learning at the edge in a wide variety of architectures for multiple AI applications.
The goal of the Ferroelectronics workshop is to bring together experts from both academia and industry, interested in this exciting and rapidly evolving field, in order to foster exchanges and ideas around the latest state-of-the-art and discuss future challenges. This one-day event consists of a plenary keynote and invited talks, as well as two open-call sessions for regular and poster presentations.
SCHEDULE
9:00-9:30 Welcome and introduction (Ian O'Connor)
9:30-10:30 FeFET-based memory and logic design (moderator: Bastien Giraud)
- Ferroelectronic devices - electrical properties and design constraints (S. Slesazeck, NaMLab, DE)
- The FeFET - learning to handle this new powerful device available in 2x CMOS platforms (S. Beyer, GlobalFoundries, DE)
Abstract: With the discovery of ferroelectricity in HfO2 based thin films 2011 and the co-integration of ferroelectric field effect transistors (FeFET) into standard high-k metal gate (HKMG) CMOS platforms 2016/17 by GLOBALFOUNDRIES, the FeFET has emerged from a theoretical dream to an applicable reality. Maturing in the beginning as a low-cost, low power eFLASH replacement, the FeFET yet is much more than a classical stiff eNVM cell. With its great HKMG CMOS compatibility, its flexibility and its unique switching properties, it is rather to be seen as a new versatile device that promises to open up new worlds. Especially the neuromorphic design community has shifted focus towards this novel device with game-changing potential. In this talk we will discuss the actual status of GlobalFoundries FeFET technology, investigate the operation and use of this device and discuss remaining challenges and outlook.
Bio: Sven Beyer received his masters and Ph.D in Physics from the University of Hamburg, Germany. He started his career with Infineon as a manufacturing engineer in the etch department in 2003. He joined the integration department of AMD in 2005. He has spent a year in the ASTA alliance 2007 working on the 45nm node and has worked in many roles since then, lasting throughout the separation of GLOBALFOUNDRIES and AMD. Today he serves as DMTS in GLOBALFOUNDRIES FAB1, overseeing mainly the eNVM roadmap and development in Dresden.
10:30-11:00 Coffee break
11:00-11:30 Regular paper session (moderator: Stefan Slesazeck)
- 2- and 3-Terminals BEOL-Compatible Ferroelectric Synaptic Weights: Scalability and Functionality (Laura Bégon-Lours, IBM Research, CH)
- FeFETs for Phase Encoded Oscillatory based Computing (Juan Núñez, IMSE-CNM, ES)
11:30-12:30 FeRAM-based memory arrays and new devices (moderator: Ian O'Connor)
- HfO2-based FeRAM arrays: current performance and perspectives for scaling (L. Grenouillet, CEA-Leti, FR)
Bio: Laurent Grenouillet received the Engineer degree in physics in 1998 from the National Institute of Applied Sciences (INSA) in Lyon, France, and the PhD degree in electronic devices in 2001 for his work on the optical spectroscopy of diluted nitrides grown on GaAs substrates. After a post-doctoral position in the field of Molecular Beam Epitaxy, he joined CEA-Leti in 2002 and worked on GaAs-based VCSELs emitting in the 1.1-1.3μm range and single photon sources with quantum dots. In 2006, he joined the Silicon Photonics group where he developed CMOS compatible hybrid III-V on silicon lasers. In 2009, he joined IBM Alliance in Albany as a Leti assignee to contribute to the development of FDSOI technology. Within Albany state-of-the-art facilities, he extensively worked on device integration to improve performance of FDSOI devices (28nm and 14nm node). Back in France at CEA-Leti in 2013, he focused on the performance boosters for the 10nm node FDSOI technology, and took part to the FDSOI technology transfer to Global Foundries (22FDX) in 2015. During that period he joined the Advanced Memory Device Laboratory at CEA-Leti. His current research interests include resistive switching memory devices, and ferroelectric HfO2-based memories. Laurent Grenouillet authored or co-authored over 80 papers (conferences and journals) and has filed over 40 patents. He serves as committee member of Solid-State Devices and Materials (SSDM) conference.
- Comparison of FeFET memory cells: Performance metrics and applications (S. Muller, FMC, DE)
Abstract: Over the last decade, more and more research and development effort has been assigned to a memory cell called the Ferroelectric Field Effect Transistor (FeFET). The invention of FeFET already took place in 1957, however, it took more than five decades until the device could be demonstrated on a semiconductor production lines in 2011. This milestone was achieved by the discovery of ferroelectric hafnium oxide which finally enables the usage of ferroelectric materials in standard production environments.
In this talk, we will review the progress that has been made on different FeFET memory cells, in particular with respect to FeFETs of MFIS- and MFMIS-type. These two types of memory cells differ from each other mainly in that sense that the second type incorporates a floating gate within its gate stack. We will give an overview of the challenges and opportunities of both cell types and propose paths for further development and optimization.
Bio: Dr. Stefan Müller received the joint master’s degree in Microelectronics from Technical University Munich, Germany, and Nanyang Technological University Singapore in 2011. He also holds a German diploma degree in Mechatronics and Information Technology as well as a bachelor’s degree in Mechanical Engineering both from Technical University Munich, Germany (2011/2008). In 2011, he joined NaMLab gGmbH, a research institute of University of Technology Dresden. In 2015, he received his PhD degree for his work on HfO2-based ferroelectric devices. In 2016, he co-founded FMC – The Ferroelectric Memory Company where he currently holds the position of CTO.
12:30-13:30 Lunch break
13:30-14:30 Keynote – In-Memory Computing with Ferroelectronics (moderator: Ian O'Connor)
- M. Niemier, U. Notre Dame, US
Abstract: Researchers are working to build more efficient and/or higher performance logic devices, memory devices, and/or fabrics where processing logic and data storage are integrated at finer granularities. The latter is especially appealing owing to challenges with the omnipresent processor-memory bottleneck, which exacerbates the efficient and expeditious processing of modern workloads.
A new device may (1) serve as a replacement for an existing technology, in an existing architecture (e.g., a new memory cell in a traditional array), or (2) serve as an “enabler” of a new circuit architecture and/or compute functionality (e.g., a new memory cell that can natively perform an important compute kernel). For (2), it is imperative to consider if (a) a new technology can perform said kernel more efficiently when compared to either CMOS and/or an existing architectural solution, (b) if said kernel can be used broadly – for a range of applications and/or within an application to justify investment, and (c) if said kernel fundamentally changes an existing algorithm – e.g., which may impact accuracy in a machine learning task.
In this talk, we consider use case (2). We study the impact of different technology-enabled, content addressable memory (CAM)-based, in-memory matching functions as applied to hyperdimensional computing problems. (1) We highlight the efficacy of multi-bit and analog CAMs when used to implement and analyze hypervectors via native distance functions, and compare solutions to approaches with higher dimensional precision/cosine distance functions; (2) We discuss how realistic implementation constraints including (a) the inherent precision of technology-based CAM solutions, (b) CAM architectures with appropriately sized sub-arrays to accommodate realistically-sized hypervectors, and (c) inherent device variations can be overcome to match the accuracy of GPU-based realizations; (3) We quantify (a) the impact of tradeoffs associated with technology-based solutions (e.g., longer hypervectors, peripheral circuitry to aggregate results from CAM sub-arrays, etc.) that may be needed to achieve iso-accuracy with existing solutions, (b) the resulting impact of energy and latency at the application-level, and (c) what parts of an existing workload technology-based solutions can accelerate / what the next design targets should be to achieve further improvements.
We conclude with a roadmap that illustrates how technology-based CAM solutions are applicable to a broad set of “at-scale” problems (e.g., applications in the MLPerf suite, bioinformatics workloads, etc.), how we might perform targeted/smart searches in extremely large subarrays (either in-memory, or in-storage), etc.
Bio: Michael Niemier is a Professor at the University of Notre Dame. He is interested in designing, facilitating, benchmarking, and evaluating circuits and architectures based on emerging logic, memory, and storage technologies for at-scale workloads/problems. He is the recipient of multiple IBM Faculty Awards, the Rev. Edmund P. Joyce, C.S.C. Award for Excellence in Undergraduate Teaching, and best paper awards such as at ISLPED. Niemier has served on numerous TPCs for design related conferences, and has chaired the emerging technologies track at DATE, DAC, and ICCAD. He is an associate editor for IEEE Transactions on Nanotechnology, as well as the ACM Journal of Emerging Technologies in Computing.
14:30-15:00 Regular paper session (moderator: Bastien Giraud)
- Modeling of Fe-FDSOI FET for Memory and Neuromorphic Applications (S. Chatterjee, IIT Kanpur, IN)
- TC-MEM improvement: TCAM and normal memory in the same circuit (C. Marchand, ECL-INL, FR)
15:00-16:00 Panel – future challenges for ferroelectronics (moderator: Ian O'Connor)
Participants (to be confirmed):
S. Beyer, GlobalFoundries, DE
S. Slesazeck, NaMLab, DE
L. Grenouillet, CEA-Leti, FR
S. Muller, FMC, DE
Laura Bégon-Lours, IBM Research, CH
M. Niemier, U. Notre Dame, US
TOPIC AREAS
You are invited to participate at the DATE 2022 Friday Workshop on Ferroelectronics. The areas of interest include (but are not limited to) the following topics:
- Ferroelectric devices and integration (FeCap, FeFET, BEoL, FEoL …) for digital, analog and bio-inspired computing
- Ferroelectric device modeling
- Non-volatile ferroelectronic memory circuits – bitcells, arrays, peripheral circuitry
- Non-volatile ferroelectronic logic – digital / multi-valued gates and datapaths
- Reconfigurable ferroelectronics
- Analog/digital/bio-inspired in-memory computation with ferroelectrics
- Architectural-level design for processing-in-memory and compute-in-memory with ferroelectronics
- Benchmarking tools for ferroelectronic hardware accelerators
- Fault-tolerance, test and reliability
- Ferroelectronics for hardware security
W01.0 Opening
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 08:30 CET - 08:40 CET
Chairs:
Paolo Bernardi, Politecnico di Torino, IT
Yervant Zorian, Synopsys, US
Riccardo Cantoro, Politecnico di Torino, IT
Wim Dobbelaere, onsemi, BE
W03 NeurONN Workshop on Neuromorphic Computing
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 08:30 CET - 17:00 CET
This workshop is supported by Fraunhofer EMFT, Research Institution for Microsystems and Solid State Technologies
Participants can register for the workshop free of charge via the online registration platform.
Workshop Description
The NeurONN project aims to implement a novel and alternative energy-efficient neuromorphic computing paradigm based on oscillatory neural networks (ONN) using energy efficient computing devices such as metal-insulator-transition (MIT) for emulating “neurons” and 2D material memristors for emulating “synapses” to achieve a truly neuro-inspired computing paradigm.
At M18 of the NeurONN project duration, we have developed a digital oscillatory network as a proof of concept of computing in phase paradigm. Digital ONN has been implemented in FPGA and tested on various tasks such as live camera streaming image recognition and robot obstacle avoidance with embedded proximity sensors. A roboter with eight proximity sensors for obstacle avoidance was developed. Then this work is now being transferred to E4 robot from AIM partner to embed ONN in their existing E4 system.
W03.0 Welcome Note
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 08:30 CET - 08:35 CET
Chair:
Jamila Boudaden, Fraunhofer EMFT, DE
Co-Chair:
Eirini Karachristou, CNRS, FR
W03.1 NeurONN Project Overview
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 08:35 CET - 09:00 CET
Speaker:
Aida Todri-Sanial, CNRS, FR
Time | Label | Presentation Title Authors |
---|---|---|
08:35 CET | W03.1.1 | NEURONN PROJECT OVERVIEW Speaker: Aida Todri-Sanial, CNRS, FR |
W03.2 Projects related to Neuromorphic computing
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 09:00 CET - 10:00 CET
Time | Label | Presentation Title Authors |
---|---|---|
09:00 CET | W03.2.1 | PHOTONIC NEUROMORPHIC COMPUTING Speaker: Frank Brückerhoff-Plückelmann, University of Münster, DE |
09:30 CET | W03.2.2 | ALGORITHM-CIRCUITS-DEVICE CO-DESIGN FOR EDGE NEUROMORPHIC INTELLIGENCE – MEMSCALE PROJECT Speaker: Melika Payvand, University of Zurich and ETH Zurich, CH |
W03.3 Materials and Devices
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 10:30 CET - 12:30 CET
Time | Label | Presentation Title Authors |
---|---|---|
10:30 CET | W03.3.1 | MODELING UNCONVENTIONAL NANOSCALED DEVICE FABRICATION – MUNDFAB PROJECT Speaker: Peter Pichler, Fraunhofer IISB, DE |
11:10 CET | W03.3.2 | RESISTANCE SWITCHING MATERIALS AND DEVICES FOR NEUROMORPHIC COMPUTING Speaker: Sabina Spiga, CNR - IMM, IT |
11:50 CET | W03.3.3 | OSCILLATING NEURAL NETWORKS POWERED BY PHASE-TRANSITION VO2 NANODEVICES Speaker: Oliver Maher, IBM, CH |
W03.LB Lunch Break
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 12:30 CET - 13:30 CET
W03.4 Demonstrators
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 13:30 CET - 14:00 CET
Time | Label | Presentation Title Authors |
---|---|---|
13:30 CET | W03.4.1 | NEURONN LIVE DEMONSTRATORS Presenter: Madeleine Abernot, CNRS, FR |
13:30 CET | W03.4.2 | NEURONN LIVE DEMONSTRATORS Presenter: Theophile Gonos, A.I.Mergence, FR |
13:30 CET | W03.4.3 NEURONN LIVE DEMONSTRATORS Speaker: Thierry Gil, CNRS, FR |
W03.5 Neuromorphic Architecture & Design
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 14:00 CET - 15:00 CET
Time | Label | Presentation Title Authors |
---|---|---|
14:00 CET | W03.5.1 | EFFECT OF DEVICE MISMATCHES IN DIFFERENTIAL OSCILLATORY NEURAL NETWORKS Speaker: Jafar Shamsi , University of Calgary , CA |
14:30 CET | W03.5.2 | MACHINE LEARNING FOR THE DESIGN OF WAVE AND OSCILLATOR-BASED COMPUTING DEVICES Speaker: Gyorgy Csaba, Pázmány University Budapest, HU |
W03.CB Coffee Break
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 15:00 CET - 15:30 CET
W03.6 Neuromorphic Computing
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 15:30 CET - 17:00 CET
Time | Label | Presentation Title Authors |
---|---|---|
15:30 CET | W03.6.1 | FULLY SPINTRONIC RADIOFREQUENCY NEURAL NETWORKS Speaker: Alice Mizrahi, CNRS/Thales, FR |
16:00 CET | W03.6.2 | ANALOG OSCILLATORY NEURAL NETWORKS FOR ENERGY-EFFICIENT COMPUTING AT THE EDGE Speaker: Corentin Delacour, CNRS, FR |
16:30 CET | W03.6.3 | RELIABLE PROCESSING-IN-MEMORY BASED MANYCORE ARCHITECTURES FOR DEEP LEARNING: FROM CNNS TO GNNS Speaker: Partha Pratim Pande, Washington State University, US |
W03.0 Welcome Note
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 08:30 CET - 08:35 CET
Chair:
Jamila Boudaden, Fraunhofer EMFT, DE
Co-Chair:
Eirini Karachristou, CNRS, FR
W04 OSHEAN - Open Source Hardware European Alliances and iNitiatives
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 08:30 CET - 17:15 CET
Organiser:
Christian Fabre, CEA, FR
The OSHEAN (Open Source Hardware European Alliances Network) workshop aims to contribute to the emergence of a European open source hardware ecosystem for ultra-low power (ULP), secure, microprocessors, microcontrollers and accelerators. The presentations and panels will cover the full value chain of open source hardware with speakers from academia, foundations and forums, to industries.
They will enable discussion with the audience on how to develop and grow a European open source hardware ecosystem. We will address key issues such as technical roadmap and gaps for open source hardware, opportunities in academic research, open source development sustainability, reliability of the open source IPs, funding and business aspects. Addressed topics will also cover legal and governance aspects of open source hardware e.g. licensing, governance, liability and regulatory issues.
Organizers and speakers originate from all the open source value chain and covers key technologies for edge applications in all power and performance ranges from deeply embedded to high-end computing, and other key aspects of hardware development that are key to open source acceptance: compilation, open source design tools, simulation, verification, real-time and mixed criticality, including certification guidelines for design of IP for safe/secure applications.
Attendees will be able to participate in the start-up of a community whose objective is to create a strong European ecosystem in open source hardware.
08:30 | Opening: | ||||
08:30 | 08:40 | Christian Fabre (CEA) | Welcome & introduction | ||
08:40 | 09:00 | John D. Davis (BSC) | How do we see the challenges in open source hardware? | ||
09:00 | Session on Open Source Hardware Technologies moderated by Davide Schiavone (OpenHW Group) |
||||
Open Source Hardware (HW) is becoming de facto an industrial standard with an increasing population of available IPs that range from ultra-low-power edge-computing platforms to high-performance, high-end systems. Particularly, RISC-V systems arise interest among processors developers and users thanks to it simple and extendable, yet performant instruction-set architecture (ISA) among a wide range of performance requirements. In this session, alongside a strategic view on RISC-V in the industry, several IPs will be presented to show how open source is shaping the future of academic and industrial solutions. There will be showcases on open source artificial intelligence architectures as well as secure microprocessors. | |||||
09:00 | 09:20 | Davide Rossi (Univ. di Bologna) | Open Source HW for IoT and its impact on the Industrial Ecosystem: the PULP Experience | ||
09:20 | 09:40 | Yves Durand (CEA) | RISC-V & accelerators: enabling variable precision FP computing | ||
09:40 | 10:00 | Gavin Ferris (lowRISC) | lowRISC’s Collaborative Framework for Open Source Silicon Design | ||
10:00 | 10:30 | Q&A and discussion with speakers on Open Source HW technologies. | |||
10:30 | 11:00 | Break | |||
11:00 | Session on Open Source Software & Support Technologies for Open Source Hardware moderated by Jérôme Quévremont (Thales) |
||||
Besides IP blocks addressed in the previous session, open source hardware requires several software technologies that benefit from open source collaboration. First of all software development tools, such as compilers, debugger, bootloaders, operating system, runtime frameworks, etc. Then design tools like FPGA compilers, CAD tools, simulators, emulators, validation tools, etc. Finally yet importantly, security benefits from open source at every level of the hardware and software stack. This session will report on several contributions these support technologies, and discuss the main upcoming issues of this field. | |||||
11:00 | 11:15 | Michael Gielda (Antmicro) | Supporting open source hardware to enable commercial adoption | ||
11:15 | 11:30 | Roger Ferrer Ibanez (BSC) | Adding support for RISC-V “V” vector extension in LLVM. | ||
11:30 | 11:45 | Stefan Mangard (TU Graz) | Improving security through open source hardware | ||
11:45 | 12:00 | Frédéric Pétrot (Grenoble INP/TIMA) | Simulation of RISC-V 128-bit extension in QEMU | ||
12:00 | 12:30 | Q&A and discussion with the speakers on Open Software & Support Technologies for Open Source Hardware. | |||
12:30 | 13:30 | Lunch | |||
13:30 | Keynote by Calista Redmond (RISC-V International): RISC-V Open Era of Computing: Innovation, adoption, and opportunity in Europe and beyond moderated by Christian Fabre (CEA) |
||||
RISC-V is the undisputed lead architecture that has ushered in a profound new open era in compute. The innovations and implementations of RISC-V span from embedded to enterprise, from IoT to HPC. RISC-V is delivering on extensions, tools, and investments of a global community ranging from start-ups to multi-nationals, from students to research fellows. This talk will highlight that progress and opportunity, with an invitation to engage | |||||
14:00 | Panel on Licensing, Funding, Cooperation & Regulation moderated by Andrew Katz (Open Forum Europe) |
||||
There are several non-technical challenges for Open Source Hardware that should be addressed on the regulatory and policy level. There is a need for researching funding mechanisms that could increase Europe's influence and the level of participation of and cooperation between SMEs, academia and large companies, as well as a clarification of the regulatory and licensing issues that might stifle innovation. Are there policy solutions that can support such developments and Open Source Hardware initiatives? Who can be a driver of change? In this panel the panellists will discuss these pertinent issues and share their experiences. | |||||
14:00 | 14:05 | Andrew Katz (Open Forum Europe) | Introduction of the panel's topics and panelists | ||
14:05 | 14:10 | Mike Milinkovich (Eclipse Foundation) | After eating software, Open is “eating” everything! | ||
14:10 | 14:15 | Romano Hoofman (Europractice) | EUROPRACTICE as Breeding Ground for European Open Source Hardware Initiatives | ||
14:15 | 14:20 | Javier Serrano (CERN) | Funding open source hardware: getting the best from publicly funded research through commercial partnerships | ||
14:20 | 14:25 | Arian Zwegers (Head of Sector, European Commission) | Reinforcing large-scale design capacities: a partial view from a funding agency | ||
14:25 | 14:30 | Calista Redmond (RISC-V International) | Building European RISC-V Leadership in Global Open Source Hardware | ||
14:30 | 14:35 | Andrew Katz (Open Forum Europe) | What are legal challenges for widespread use of open source HW? What are the licensing issues? | ||
14:35 | 15:30 | Panel on Licensing, Funding, Cooperation & Regulation. | |||
15:30 | 15:45 | Break | |||
15:45 | Panel on Industrial Concerns moderated by Frank K. Gürkaynak (ETH Zürich) |
||||
Open source hardwarre, especially around the RISC-V architecture has been a talking point in the recent years. There has been a rapid development and from humble beginnings where open source hardware was a niche between enthusiasts and academics, today we have multi-billion dollar companies and recently a commitment from the European Commission to support work in open source hardware. In this session, we would like to go beyond the buzz and discuss with people involved in industry what opportunities they see, what the potential roadblocks are and what they think is still missing. | |||||
15:45 | 15:50 | Frank K. Gürkaynak (ETH Zürich) | Introduction of the panel's topics and panelists | ||
15:50 | 15:55 | Rick O'Connor (OpenHW Group) | OpenHW CORE-V: RISC-V open-source cores for high volume production SoCs | ||
15:55 | 16:00 | Loïc Lietar (GreenWaves) | Leveraging open source hardware in commercial products: benefits and challenges | ||
16:00 | 16:05 | Matthias Hiller (Fraunhofer, ECSO) | Can open source HW address industrial concerns for cybersecurity and trusted electronics? | ||
16:05 | 16:10 | Jean-Christian Kircher(Bosch France) | Industrial requirements for open source hardware | ||
16:10 | 16:15 | Thierry Collette (Thales) | THALES' perspectives on open source hardware | ||
16:15 | 16:20 | Zdeněk Přikryl (Codasip) | |||
16:20 | 17:15 | Panel with the speakers on Industrial concerns of open source hardware. | |||
17:15 | Closing |
The organizers of the OSHEAN workshop are:
- John D. Davis, Group Manager of the European Exascale Accelerator at Barcelona Supercomputing Center (Spain). Organiser of workshops on RISC-V and OpenPOWER.
- Christian Fabre, Research Engineer at CEA LIST (Grenoble, France). Organiser of workshops on RISC-V and open source hardware.
- Benedikt Gierlichs, research expert on embedded security at KU Leuven (Belgium).
- Paula Grzegorzewska, Senior Policy Advisor at OpenForum Europe (Brussels, Belgium).
- Frank K. Gürkaynak, senior scientist of the group of Digital Circuits & Systems, Department of Information Technology and Electrical Engineering (D-ITET), ETH Zurich (Switzerland).
- Jérôme Quévremont, open hardware project leader at Thales R&T (Palaiseau, France), and contributes to RISC-V International and the OpenHW Group.
- Davide Rossi is Associate Professor at University of Bologna (Italy) contributing to open source hardware since 2013 though the PULP project.
- Davide Schiavone is Director of Engineering at OpenHW Group and coordinator of the OpenHW Europe Working Group under the Eclipse Foundation (Germany).
- Stefan Wallentowitz, professor at Munich University of Applied Sciences (Germany), director, Free and Open Source Silicon Foundation, & board member, RISC-V International.
W04.0 Opening
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 08:30 CET - 09:00 CET
Session chair:
Christian Fabre, CEA, FR
Time | Label | Presentation Title Authors |
---|---|---|
08:30 CET | W04.0.1 | WELCOME & INTRODUCTION Speaker: Christian Fabre, CEA, FR |
08:40 CET | W04.0.2 | HOW DO WE SEE THE CHALLENGES IN OPEN SOURCE HARDWARE? Speaker: John Davis, BSC, ES |
W04.1 Open Source Hardware Technologies
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 09:00 CET - 10:30 CET
Session chair:
Davide Schiavone, OpenHW Group, IT
Open Source Hardware (HW) is becoming de facto an industrial standard with an increasing population of available IPs that range from ultra-low-power edge-computing platforms to high-performance, high-end systems. Particularly, RISC-V systems arise interest among processors developers and users thanks to it simple and extendable, yet performant instruction-set architecture (ISA) among a wide range of performance requirements. In this session, alongside a strategic view on RISC-V in the industry, several IPs will be presented to show how open source is shaping the future of academic and industrial solutions. There will be showcases on open source artificial intelligence architectures as well as secure microprocessors.
Time | Label | Presentation Title Authors |
---|---|---|
09:00 CET | W04.1.1 | OPEN SOURCE HW FOR IOT AND ITS IMPACT ON THE INDUSTRIAL ECOSYSTEM: THE PULP EXPERIENCE Speaker: Davide Rossi, Univ. di Bologna, IT |
09:20 CET | W04.1.2 | RISC-V & ACCELERATORS: ENABLING VARIABLE PRECISION FP COMPUTING Presenter: Yves Durand, CEA, FR |
09:40 CET | W04.1.3 | LOWRISC’S COLLABORATIVE FRAMEWORK FOR OPEN SOURCE SILICON DESIGN Presenter: Gavin Ferris, lowRISC, GB |
10:00 CET | W04.1.4 | PANEL WITH SPEAKERS ON OPEN SOURCE HW TECHNOLOGIES. Panellists: Yves Durand1, Davide Rossi2 and Gavin Ferris3 1CEA, FR; 2Univ. di Bologna, IT; 3lowRISC, GB Moderator: Davide Schiavone, OpenHW Group, IT Abstract Q&A session with the audience and panel discussion on Open Source HW technologies. |
W04.2 Open Source Software & Support Technologies for Open Source Hardware
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 11:00 CET - 12:30 CET
Session chair:
Jérôme Quévremont, Thales, FR
Besides IP blocks addressed in the previous session, open source hardware requires several software technologies that benefit from open source collaboration. First of all software development tools, such as compilers, debugger, bootloaders, operating system, runtime frameworks, etc. Then design tools like FPGA compilers, CAD tools, simulators, emulators, validation tools, etc. Finally yet importantly, security benefits from open source at every level of the hardware and software stack. This session will report on several contributions these support technologies, and discuss the main upcoming issues of this field.
Time | Label | Presentation Title Authors |
---|---|---|
11:00 CET | W04.2.1 | SUPPORTING OPEN SOURCE HARDWARE TO ENABLE COMMERCIAL ADOPTION Presenter: Michael Gielda, Antmicro, PL |
11:15 CET | W04.2.2 | ADDING SUPPORT FOR RISC-V “V” VECTOR EXTENSION IN LLVM. Presenter: Roger Ferrer Ibanez, BSC, ES |
11:30 CET | W04.2.3 | IMPROVING SECURITY THROUGH OPEN SOURCE HARDWARE Presenter: Stefan Mangard, TU Graz, AT |
11:45 CET | W04.2.4 | SIMULATION OF RISC-V 128-BIT EXTENSION IN QEMU Presenter: Frédéric Pétrot, Grenoble INP/TIMA, FR |
12:00 CET | W04.2.5 | PANEL WITH THE SPEAKERS ON OPEN SOFTWARE & SUPPORT TECHNOLOGIES FOR OPEN SOURCE HARDWARE Panellists: Michael Gielda1, Roger Ferrer Ibanez2, Stefan Mangard3 and Frédéric Pétrot4 1Antmicro, PL; 2BSC, ES; 3TU Graz, AT; 4Grenoble INP/TIMA, FR Moderator: Jérôme Quévremont, Thales, FR Abstract Q&A session with the audience and panel discussion on Open Software & Support Technologies for Open Source Hardware. |
W04.3 Keynote - RISC-V Open Era of Computing: Innovation, adoption, and opportunity in Europe and beyond
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 13:30 CET - 14:00 CET
Keynote Speaker:
Calista Redmond, RISC-V International, US
RISC-V is the undisputed lead architecture that has ushered in a profound new open era in compute. The innovations and implementations of RISC-V span from embedded to enterprise, from IoT to HPC. RISC-V is delivering on extensions, tools, and investments of a global community ranging from start-ups to multi-nationals, from students to research fellows. This talk will highlight that progress and opportunity, with an invitation to engage
W04.4 Panel on Licensing, Funding, Cooperation & Regulation
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 14:00 CET - 15:30 CET
Session chair:
Andrew Katz, OpenForum Europe, GB
There are several non-technical challenges for Open Source Hardware that should be addressed on the regulatory and policy level. There is a need for researching funding mechanisms that could increase Europe's influence and the level of participation of and cooperation between SMEs, academia and large companies, as well as a clarification of the regulatory and licensing issues that might stifle innovation. Are there policy solutions that can support such developments and Open Source Hardware initiatives? Who can be a driver of change? In this panel the panellists will discuss these pertinent issues and share their experiences.
Time | Label | Presentation Title Authors |
---|---|---|
14:00 CET | W04.4.1 | INTRODUCTION OF THE PANEL'S TOPICS AND PANELISTS Presenter: Andrew Katz, OpenForum Europe, GB |
14:05 CET | W04.4.2 | AFTER EATING SOFTWARE, OPEN IS “EATING” EVERYTHING! Panellist: Mike Milinkovich, Eclipse Foundation, BE |
14:10 CET | W04.4.3 | EUROPRACTICE AS BREEDING GROUND FOR EUROPEAN OPEN SOURCE HARDWARE INITIATIVES Presenter: Romano Hoofman, Europractice, BE |
14:15 CET | W04.4.4 | FUNDING OPEN SOURCE HARDWARE: GETTING THE BEST FROM PUBLICLY FUNDED RESEARCH THROUGH COMMERCIAL PARTNERSHIPS Presenter: Javier Serrano, CERN, CH |
14:20 CET | W04.4.5 | REINFORCING LARGE-SCALE DESIGN CAPACITIES: A PARTIAL VIEW FROM A FUNDING AGENCY Presenter: Arian Zwegers, European Commission, BE |
14:25 CET | W04.4.6 | BUILDING EUROPEAN RISC-V LEADERSHIP IN GLOBAL OPEN SOURCE HARDWARE Presenter: Calista Redmond, RISC-V International, US |
14:30 CET | W04.4.7 | WHAT ARE LEGAL CHALLENGES FOR WIDESPREAD USE OF OPEN SOURCE HW? WHAT ARE THE LICENSING ISSUES? Presenter: Andrew Katz, OpenForum Europe, GB |
14:35 CET | W04.4.8 | PANEL DISCUSSION ON LICENSING, FUNDING, COOPERATION & REGULATION. Panellists: Romano Hoofman1, Javier Serrano2, Arian Zwegers3, Calista Redmond4 and Mike Milinkovich5 1Europractice, BE; 2CERN, CH; 3European Commission, BE; 4RISC-V International, US; 5Eclipse Foundation, BE Moderator: Andrew Katz, OpenForum Europe, GB Abstract Q&A session with the audience and panel discussion on on Licensing, Funding, Cooperation & Regulation |
W04.5 Panel on Industrial Concerns
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 15:45 CET - 17:15 CET
Session chair:
Frank Gürkaynak , ETH Zürich, CH
Open source hardwarre, especially around the RISC-V architecture has been a talking point in the recent years. There has been a rapid development and from humble beginnings where open source hardware was a niche between enthusiasts and academics, today we have multi-billion dollar companies and recently a commitment from the European Commission to support work in open source hardware. In this session, we would like to go beyond the buzz and discuss with people involved in industry what opportunities they see, what the potential roadblocks are and what they think is still missing.
Time | Label | Presentation Title Authors |
---|---|---|
15:45 CET | W04.5.1 | INTRODUCTION OF THE PANEL'S TOPICS AND PANELISTS Presenter: Frank Gürkaynak , ETH Zürich, CH |
15:50 CET | W04.5.2 | OPENHW CORE-V: RISC-V OPEN-SOURCE CORES FOR HIGH VOLUME PRODUCTION SOCS Presenter: Rick O'Connor, OpenHW Group, CA |
15:55 CET | W04.5.3 | LEVERAGING OPEN SOURCE HARDWARE IN COMMERCIAL PRODUCTS: BENEFITS AND CHALLENGES Presenter: LoÏc Lietar, GreenWaves, FR |
16:00 CET | W04.5.4 | CAN OPEN SOURCE HW ADDRESS INDUSTRIAL CONCERNS FOR CYBERSECURITY AND TRUSTED ELECTRONICS? Presenter: Matthias Hiller, Fraunhofer, ECSO, DE |
16:05 CET | W04.5.5 | INDUSTRIAL REQUIREMENTS FOR OPEN SOURCE HARDWARE Presenter: Jean-Christian Kircher, Bosch, FR |
16:10 CET | W04.5.6 | THALES' PERSPECTIVES ON OPEN SOURCE HARDWARE Presenter: Thierry Collette, Thales, FR |
16:15 CET | W04.5.7 | INDUSTRIAL CONCERNS ABOUT OPEN HARDWARE Presenter: Zdeněk Přikryl, Codasip, CZ |
16:20 CET | W04.5.8 | PANEL DISCUSSION ON INDUSTRIAL CONCERNS Panellists: Rick O'Connor1, LoÏc Lietar2, Matthias Hiller3, Jean-Christian Kircher4, Thierry Collette5 and Zdeněk Přikryl6 1OpenHW Group, CA; 2GreenWaves, FR; 3Fraunhofer, ECSO, DE; 4Bosch, FR; 5Thales, FR; 6Codasip, CZ Moderator: Frank Gürkaynak , ETH Zürich, CH Abstract Q&A session with the audience and panel discussion on industrial concerns of open source hardware. |
W04.0 Opening
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 08:30 CET - 09:00 CET
Session chair:
Christian Fabre, CEA, FR
Time | Label | Presentation Title Authors |
---|---|---|
08:30 CET | W04.0.1 | WELCOME & INTRODUCTION Speaker: Christian Fabre, CEA, FR |
08:40 CET | W04.0.2 | HOW DO WE SEE THE CHALLENGES IN OPEN SOURCE HARDWARE? Speaker: John Davis, BSC, ES |
W09 Sustainability in Security, Security for Sustainability
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 08:30 CET - 17:00 CET
Organisers:
Elif Bilge Kavun, University of Passau, DE
Francesco Regazzoni, University of Amsterdam, NL and Università della Svizzera italiana, CH
Speakers:
Nikolaos Athanasios Anagnostopoulos, University of Passau & TU Darmstadt, DE
Md Masoom Rabbani, ES&S, imec-COSIC, ESAT, BE
Owen Millwood, University of Sheffield, GB
Harishma Boyapally, Indian Institute of Technology Kharagpur, India, IN
Kanad Basu, University of Texas at Dallas, US
Apostolos Fournaris, ISI, GR
Keynote Speakers:
Jan Tobias Muehlberg, IMEC-DistriNet, KU Leuven, BE
Paola Grosso, University of Amsterdam, NL
Ilia Polian, University of Stuttgart, DE
Scope and aim: Security is a fundamental extra-functional requirement that systems should provide. As such, it should be implemented in a sustainable way, namely achieving at least a very limited energy consumption, and being at least capable of supporting crypto-agility (so to allow updates of security primitives rather than replacement of whole devices). These two properties are challenging to offer in security, since several attacks and weaknesses are discovered every day and simple updates could not be sufficient to defeat them. The situation is further complicated by the fact that, in this moment, families of cryptographic algorithm are being replaced by novel standards (such as the post quantum one).
Security can even be of great help to support sustainability, for instance by allowing secure update of devices and enabling maintenance that would extend the devices live. Yet, support for these features should be studied in depth and fully understood to avoid the involuntary insertion of security weaknesses.
This workshop addresses the relation between sustainability and security from both sides, discussing what can be done to make security more sustainable and presenting what security can offer to make electronic devices more sustainable.
Topic Areas – You are invited to participate and submit your contributions to the Sustainability in Security, Security for Sustainability Workshop. The workshop’s areas of interest include (but are not limited to) the following topics:
- Low-energy cryptographic implementations
- Reusable and backward compatible security solutions
- Crypto agility
- Lightweight cryptography
- Support for secure and reliable updates
The workshop program is available on the workshop website: http://sussec22.alari.ch/
W09.0 Openings and Welcome
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 09:00 CET - 09:15 CET
Organisers:
Francesco Regazzoni, University of Amsterdam, NL and Università della Svizzera italiana, CH
Elif Bilge Kavun, University of Passau, DE
W09.1 Keynote 1
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 09:15 CET - 10:00 CET
Session chair:
Subhadeep Banik, Università della Svizzera italiana, CH
Time | Label | Presentation Title Authors |
---|---|---|
09:15 CET | W09.1.1 | NANOELECTRONICS: AN ENABLER FOR SUSTAINABLE SECURITY Keynote Speaker: Ilia Polian, University of Stuttgart, DE |
W09.2 Session 1: Lightweight Physical Primitives for Security
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 10:15 CET - 11:15 CET
Session chair:
Apostolos Fournaris, ISI, GR
Time | Label | Presentation Title Authors |
---|---|---|
10:15 CET | W09.2.1 | ON THE SUSTAINABILITY OF LIGHTWEIGHT CRYPTOGRAPHY BASED ON FLASH PUF Speaker: Nikolaos Athanasios Anagnostopoulos, University of Passau & TU Darmstadt, DE |
10:35 CET | W09.2.2 | A ONE-TIME PUF PROTOCOL FOR IMPROVING SUSTAINABILITY AND SECURITY FOR HARDWARE-ROOTED LIGHTWEIGHT SECURITY SYSTEMS Speaker: Owen Millwood, University of Sheffield, GB |
10:55 CET | W09.2.3 | PHYSICALLY RELATED FUNCTIONS: A NEW HARDWARE SECURITY PRIMITIVE FOR LIGHTWEIGHT CRYPTOGRAPHIC PROTOCOLS Speaker: Harishma Boyapally, Indian Institute of Technology Kharagpur, India, IN |
W09.3 Keynote 2
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 11:30 CET - 12:30 CET
Session chair:
Francesco Regazzoni, University of Amsterdam, NL and Università della Svizzera italiana, CH
Time | Label | Presentation Title Authors |
---|---|---|
11:30 CET | W09.3.1 | SUSTAINABLE SECURITY: WHAT DO WE SUSTAIN, AND FOR WHOM? Keynote Speaker: Jan Tobias Muehlberg, IMEC-DistriNet, KU Leuven, BE |
W09.4 Keynote 3
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 13:30 CET - 14:30 CET
Session chair:
Elif Bilge Kavun, University of Passau, DE
Time | Label | Presentation Title Authors |
---|---|---|
13:30 CET | W09.4.1 | PROGRAMMABILITY AND SUSTAINABILITY IN THE FUTURE INTERNET Keynote Speaker: Paola Grosso, University of Amsterdam, NL |
W09.5 Session 2: Lightweight Security & Safety for Emerging Technologies
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 14:45 CET - 15:45 CET
Session chair:
Ilia Polian, University of Stuttgart, DE
Time | Label | Presentation Title Authors |
---|---|---|
14:45 CET | W09.5.1 | FUNCTIONAL SAFETY OF DEEP NEURAL NETWORK ACCELERATOR Speaker: Kanad Basu, University of Texas at Dallas, US |
15:05 CET | W09.5.2 | SUSTAINABLE LATTICE BASED CRYPTOGRAPHY USING OPENCL NUMBER-THEORETIC TRANSFORM Speaker: Apostolos Fournaris, ISI, GR |
15:25 CET | W09.5.3 | REMOTE ATTESTATION OF IOT DEVICES: PAST, PRESENT AND FUTURE Speaker: Md Masoom Rabbani, ES&S, imec-COSIC, ESAT, BE |
W09.6 Panel: Is Security An Enabler, An Enemy or Simply A Nightmare for Sustainability?
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 16:00 CET - 17:00 CET
Session Chairs:
Francesco Regazzoni, University of Amsterdam, NL and Università della Svizzera italiana, CH
Elif Bilge Kavun, University of Passau, DE
Panellists:
David Bol, UC Louvain, BE
Yuri Demchenko, University of Amsterdam, NL
Marc Stoettinger, Rhein Main University of Applied Sciences, DE
Ruggero Susella, ST Microelectronics, IT
W10 Friday Interactive Day of the Special Initiative on Autonomous Systems Design
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 08:30 CET - 17:00 CET
Organisers:
Chung-Wei Lin, National Taiwan University, TW
Sebastian Steinhorst, TU Munich, DE
Sponsors
Thanks to the sponsorship, participants can register for the workshop free of charge via the online registration platform.
Program
W10.S1 Opening & Keynote
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 08:30 CET - 09:30 CET
Session chair:
Rolf Ernst, TU Braunschweig, DE
Speaker:
Joseph Sifakis, Verimag Laboratory, FR
8:30 - 8:45: Opening of ASD Friday Interactive Day, Introduction of Sponsors
8:45 - 9:30: Keynote: Trustworthy Autonomous Systems Development
Keynote Speaker: Joseph Sifakis, Verimag Laboratory
Session Chair: Rolf Ernst, TU Braunschweig, DE
Abstract: Autonomous systems emerge from the needs to automate existing organizations by progressive replacement of human operators by autonomous agents. Their development raises multi-faceted challenges, which go well beyond the limits of weak AI.
We attempt an analysis of the current state of art focusing on design and validation.
First, we explain that existing approaches to agent design are unsatisfactory. Traditional model-based approaches are defeated by the complexity of the problem, while solutions based on end-to-end machine learning fail to provide the necessary trustworthiness guarantees. We advocate "hybrid design" solutions that take the best of each approach and seek tradeoffs between trustworthiness and performance. In addition, we argue that traditional case-by-case risk analysis and mitigation techniques are failing to scale, and we discuss the trend away from correctness at design time and toward reliance on runtime assurance techniques.
Second, we explain that simulation and testing remain the only realistic approach for global validation, and we show how current methods and practices can be transposed to autonomous systems by identifying the technical requirements involved.
We conclude by discussing the factors that will play a decisive role in the acceptance of autonomous systems, and by arguing for the urgent need for new theoretical foundations.
W10.S2 Human-Machine Systems
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 09:30 CET - 10:45 CET
Session chair:
Chung-Wei Lin, National Taiwan University, TW
Speaker & Panellists:
Meike Jipp, German Aerospace Center (DLR), DE
Lv Chen, Nanyang Technological University, SG
Shunsuke Aoki, National Institute of Informatics, JP
The technology of autonomous vehicles continues advancing, but there will be a long time before human-driven vehicles are completely replaced by fully-autonomous vehicles. Therefore, mixed traffic environments need to be taken care of to provide safe, efficient, and comfortable transportation. In this session, the experts will discuss the roles of autonomous vehicles, human-driven vehicles, and roadside units and share their visions on human-machine systems. The ultimate goal is to support a smooth transition from current transportation to autonomous transportation.
Agenda of Talks and Speakers/Panelists:
-
09:30 - 09:45: "Towards Cooperative Autonomous Vehicles for Mixed Traffic Environments", Shunsuke Aoki, National Institute of Informatics, JP
- 09:45 - 10:00: Meike Jipp, German Aerospace Center (DLR), DE
-
10:00 - 10:15: "Human-Like Autonomous Driving and Human-Machine Systems", Chen Lv, Nanyang Technological University, SG
- 10:15 - 10:45: Interactive panel discussion on Human-Machine Systems
W10.B1 Break
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 10:45 CET - 11:00 CET
W10.S3 Hardware and Components
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 11:00 CET - 12:15 CET
Session chair:
Sebastian Steinhorst, TU Munich, DE
Speaker & Panellists:
Bart Vermeulen, NXP, NL
Hai (Helen) Li, Duke University, US
Luca Benini, ETH Zurich, CH
For the design of autonomous systems, powerful hardware and system components are as much a core enabler of advanced autonomy as the software running on them. With the increase in cognitive capabilities by integration of computation and sensing platforms, many opportunities as well as challenges arise, such that both hardware and AI-centric software can operate fully synergetic and, hence, reach their full potential. The purpose of this session is to discuss the latest trends in hardware components and their design aspects for efficient and holistic integration of computation, sensing and communication.
Agenda of Talks and Speakers/Panelists:
- 11:00 - 11:15: "PULP: An Open Ultra Low-Power Platform for Autonomous Nano Drones", Luca Benini, ETH Zürich, CH
- 11:15 - 11:30: "Challenges & Solutions for Next-Generation E/E Architectures", Bart Vermeulen, NXP, NL
- 11:30 - 11:45: "Efficient Machine Learning: Algorithms-Circuits-Devices Co-design", Hai (Helen) Li, Duke University, US
- 11:45 - 12:15: Interactive panel discussion on Hardware and Components
W10.B2 Lunch Break
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 12:15 CET - 13:30 CET
W10.S4 Panel on Autonomous Systems Design
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 13:30 CET - 15:00 CET
Session Chair:
David Harel, Weizmann Institute of Science, IL
Panellists:
Sandeep Neema, Darpa, US
Alberto Sangiovanni, UC Berkeley, US
Carlo Ghezzi, Politecnico di Milano, IT
Simon Burton, Fraunhofer, DE
Michael Paulitsch, Intel, DE
Arne Haman, Bosch Research, DE
Organizers:
- David Harel, Weizmann Institute of Science
- Joseph Sifakis, Verimag Laboratory
Moderator:
- David Harel, Weizmann Institute of Science
Panelists:
- Sandeep Neema, Darpa
- Alberto Sangiovanni, Berkeley
- Carlo Ghezzi, Politecnico di Milano
- Simon Burton, Fraunhofer
- Michael Paulitsch, Intel
- Arne Haman, Bosch
Topics to be discussed by the panel:
1) What is your vision for AS? For example, the role of these systems in the IoT and AI revolutions; autonomy as a step from weak to strong AI; the gap between automated and autonomous systems.
2) What challenges do you see in AS design? For example, AI-enabled end-to-end solutions; "hybrid design" approaches, integrating model- and data-driven components; systems engineering issues.
3) How should we ensure the reliability of AS? For example, achieving explainable AI; adapting and extending rigorous V&V techniques to ASs; ensuring safety based exclusively on simulation and testing.
4) Looking to the future, is the vision of total autonomy viable? how can we make it happen? For example, decisive factors for acceptance; research challenges; ethical issues; "easy" total autonomy categories.
W10.B3 Break
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 15:00 CET - 15:10 CET
W10.S5 V2X, Edge Computing and Connected Applications
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 15:10 CET - 16:50 CET
Session chair:
Dirk Ziegenbein, Robert Bosch GmbH, DE
Speaker & Panellists:
Frank Hofmann, Bosch, DE
Chun-Ting Chou, OmniEyes, TW
Ziran Wang, Toyota Motor North America R&D, US
Stefano Marzani, Amazon, US
Connectivity realizes many advanced applications for vehicles. Especially, the interactions between vehicles and edge servers (or roadside units) further boost the trend and involve more players in the business. In this session, the experts from an auto-maker, a supplier, a high-tech company, and a start-up will meet together, describe their roles in the connected and edge-computing environment, and discuss potential integration or competition.
Agenda of Talks and Speakers/Panelists:
- 15:10 - 15:25: "Video Uberization Using Edge AI and Mobile Video", Chun-Ting Chou, OmniEyes, TW
- 15:25 - 15:40: "Connected Applications as Driver for Automation", Frank Hofmann, Bosch, DE
- 15:40 - 15:55: "Environmental parity between cloud and embedded edge as a foundation for software-defined vehicles", Stefano Marzani, Amazon, US
- 15:55 - 16:10: "Mobility Digital Twin with Connected Vehicles and Cloud Computing", Ziran Wang, Toyota Motor North America R&D, US
- 16:10 - 16:50: Interactive panel discussion on V2X, Edge Computing, and Connected Applications
W10.S6 Closing
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 16:50 CET - 17:00 CET
W10.S1 Opening & Keynote
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 08:30 CET - 09:30 CET
Session chair:
Rolf Ernst, TU Braunschweig, DE
Speaker:
Joseph Sifakis, Verimag Laboratory, FR
8:30 - 8:45: Opening of ASD Friday Interactive Day, Introduction of Sponsors
8:45 - 9:30: Keynote: Trustworthy Autonomous Systems Development
Keynote Speaker: Joseph Sifakis, Verimag Laboratory
Session Chair: Rolf Ernst, TU Braunschweig, DE
Abstract: Autonomous systems emerge from the needs to automate existing organizations by progressive replacement of human operators by autonomous agents. Their development raises multi-faceted challenges, which go well beyond the limits of weak AI.
We attempt an analysis of the current state of art focusing on design and validation.
First, we explain that existing approaches to agent design are unsatisfactory. Traditional model-based approaches are defeated by the complexity of the problem, while solutions based on end-to-end machine learning fail to provide the necessary trustworthiness guarantees. We advocate "hybrid design" solutions that take the best of each approach and seek tradeoffs between trustworthiness and performance. In addition, we argue that traditional case-by-case risk analysis and mitigation techniques are failing to scale, and we discuss the trend away from correctness at design time and toward reliance on runtime assurance techniques.
Second, we explain that simulation and testing remain the only realistic approach for global validation, and we show how current methods and practices can be transposed to autonomous systems by identifying the technical requirements involved.
We conclude by discussing the factors that will play a decisive role in the acceptance of autonomous systems, and by arguing for the urgent need for new theoretical foundations.
W03.1 NeurONN Project Overview
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 08:35 CET - 09:00 CET
Speaker:
Aida Todri-Sanial, CNRS, FR
Time | Label | Presentation Title Authors |
---|---|---|
08:35 CET | W03.1.1 | NEURONN PROJECT OVERVIEW Speaker: Aida Todri-Sanial, CNRS, FR |
W01.1 Keynote: From combustion towards electrical cars
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 08:40 CET - 09:40 CET
Keynote Speaker:
Riccardo Groppo, Ideas and Motion, IT
Short bio: Riccardo Groppo took his MSc degree in Electronic Engineering at the Politecnico of Torino (Torino, Italy). He is the co-founder and CEO of Ideas & Motion, a high-tech company focused on IP development on silicon and design of complex automotive control systems for niche applications. He is the Chairman of Transportation Working Group and Board Vice-Chairman within EPoSS (European Platform on Smart Systems Integration). He is Member of the Technical Committee in some relevant events worldwide (SAE World Congress, AMAA Conference and Smart System Integration Conference). He started his career with Honeywell Bull and then joined Centro Ricerche FIAT (CRF) in 1989, where he was involved in the design of innovative engine/vehicle automotive control systems. He was a member of the CRF team who developed the first automotive Common Rail system for a direct injection Diesel engine. Then he was involved in the design and industrialization of the MultiAir Technology and the Dry Dual Clutch transmission. He has been the Head of the Automotive Electronics Design and Development Dept. at CRF (2002-2013), where he promoted the design of IP building blocks by means of ASIC technology in cooperation with Freescale Semiconductors and Robert BOSCH for the FIAT/Chrysler applications. Those smart drivers are the standard de-facto in automotive powertrain applications, with volumes exceeding 17 Million parts/year. He holds more 31 patents in the field of automotive electronics and embedded systems, most of which are currently in production on passenger cars.
W02 3D Integration: Heterogeneous 3D Architectures and Sensors
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 08:45 CET - 17:30 CET
Organiser:
Pascal VIVET, CEA-LIST, IRT Nanoelec, FR
Speakers:
Gianna Paulin, ETHZ, CH
Mauricio Altieri, CEA, FR
Prachi Shukla, Boston University, US
Denis Dutoit, CEA, FR
Shimeng Yu, GeorgiaTech, US
Andy Heinig, Fraunhofer, DE
Tathagata Srimani, Stanford University, US
Ricardo Carmona-Galán, CSIC-University of Seville, ES
Rajiv Mongia, INTEL, US
JeanLuc Jaffard, PROPHESEE, FR
Pascal Vivet, CEA, FR
Anthony Mastroianni, SIEMENS EDA, US
Cedric Tubert , STMicroelectronics, FR
SungKyu Lim, GeorgiaTech, US
Umesh Chand, NTU, SG
Keynote Speaker:
Seung-Chul (SC) Song, QUALCOMM, US
Workshop Description
3D technologies are becoming more and more pervasive in digital architectures, as a strong enabler for heterogeneous integration. Due to the high amount of required data and associated memory capacity, Machine Learning and AI accelerator could benefit of 3D integration not only for High Performance Computing (HPC), but also for the edge and embedded HPC. 3D integration and associated architectures are opening a wide spectrum of system solutions, from chiplet-based partitioning for High Performance Computing to various sensors such as fully integrated image sensors embedding AI features, but also with the tight 3D coupling of computing & memory enabling efficient In-Memory-Computing paradigm.
The goal of the 3D Integration Workshop is to bring together experts from both academia and industry, interested in this exciting and rapidly evolving field, in order to update each other on the latest state-of-the-art, exchange ideas, and discuss future challenges.
This one-day event consists of two keynotes, and four sessions, with invited presentations and submitted presentations. Previous editions of this workshop took place regularly, in conjunction with early editions of DATE conferences.
Workshop Committee
- General co-Chairs:
- P. Vivet, CEA-LIST, IRT Nanoelec (FR)
- M. Badaroglu, Qualcomm (BE)
- Program co-Chairs :
- P. Ramm, Fraunhofer EMFT (GE)
- S. Mukhopadhyay, Georgia Tech (USA)
- Special Session Chair
- S. Mitra, Stanford University (USA)
- Industrial Liaison Chair
- Eric Ollier, CEA-Leti, IRT Nanoelec (FR)
Sponsorship
The DATE'2022 3D Integration: Heterogeneous 3D Architectures and Sensors workshop is technically co-sponsored by IRT Nanoelec.
Registration
For workshop registration, please follow the regular DATE registration web site: online registration platform.
Technical Program
(All times are given in CET time, European Time, UTC+1).
For time zones, USA changes day time on 13 March, while France changes day time on 27 March. For the workshop on 18 March, there will be 8 hours difference between PST and CET (instead of 9 hours usually).
Workshop Start
8:45 – 9:00 Welcome Note from Organizers
First Keynote
Session Chair : Mustafa Badaroglu, Qualcomm, BE
9:00 – 9:45 “System Design Technology Co-Optimization for 3D Integration”,
Seung-Chul (SC) Song, Qualcomm, USA
Session 1 : Chiplet Partitioning and System Level Design
Session Chair : Pascal Vivet, CEA-LIST, France
09:45 – 10:05 “Occamy - A 2.5D Chiplet System for Ultra-Efficient Floating-Point Computing”,
Gianna Paulin, ETHZ, Switzerland.
10:05 – 10:25 “Chiplet Based Architecture : an answer for Europe Sovereignty in Computing ?”,
Denis Dutoit, CEA, France.
10:25 – 10:45 “Automotive electronic control unit (ECU) for ADAS application based on a Chiplet approach”,
Andy Heinig, Fabian Hopsch, Fraunhofer, Germany
10:45 – 11:15 Coffee break
Session 2 : 3D and Image Sensors
Session Chair : Eric Ollier, CEA-Leti, France
11:15 – 11:35 “Efficient image feature extraction exploiting massive parallelism through 3D-integration”,
Ricardo Carmona-Galán, CSIC-University of Seville, Spain.
11:35 – 11:55 “3D Integration for Smart Event Vision Sensors”,
Jean-Luc Jaffard, PROPHESEE, France.
11:55 – 12 :15 “3D-stacked CMOS Image Sensors for High Performance Indirect Time-of-Flight”,
Cédric Tubert, STMicroelectronics, France
12:15 – 13:30 Lunch
Session 3 : Ultra High Density of 3D, Monolithic 3D
Session Chair : Saibal Mukhopadhyay, GeorgiaTech, USA
13:30 – 13:50 “Thin-film based monolithic 3D systems”,
Umesh Chand, Sonu Devi, Hasita Veluri, Aaron Thean, Mohamed Sabry Aly, NTU, Singapore.
13:50 – 14:10 “Temperature-Aware Monolithic 3D DNN Accelerators for Biomedical Applications”,
Prachi Shukla, Vasilis F. Pavlidis, Emre Salman, Ayse K. Coskun, Boston Univ., USA; Univ. of Manchester, UK; Stony Brook Univ., USA.
14:10 – 14:30 “A Compute-in-Memory Hardware Accelerator Design with BEOL Transistor based Reconfigurable Interconnect”,
Shimeng Yu, GeorgiaTech, USA ; Suman Datta, NotreDam Univ., USA.
14:30 - 14:50 “Nanosystems for Energy-Efficient Computing using Carbon Nanotube FETs and Monolithic 3D Integration”,
Tathagata Srimani, Stanford University, USA.
14:50 – 15:15 Coffee break
Second Keynote
Session Chair : Subhasish Mitra, Univ. Stanford, USA.
15:15 – 15:45 “3D stacking opportunities for Augmented Reality hardware systems”,
Edith Beigne, META, USA
Session 4 : 3D Design, Methodology and Thermal
Session Chair : Peter Ramm, Fraunhofer EMFT, Germany.
15:45 – 16:05 “EDA Tools and PPA Tradeoff Studies for Micro-bump and Hybrid Bond 3D ICs”,
Sung-Kyu Lim, GeorgiaTech, USA.
16:05 – 16:25 “Heterogeneous Packaging Design and Verification Workflows”,
Anthony Mastroianni, SIEMENS EDA, USA.
16:25 – 16:45 “Towards a Place and Route Flow for High Density 3D-ICs”,
Mauricio Altieri, Olivier Billoint, Sebastien Thuries and Pascal Vivet, CEA, France.
16:45 – 17:05 “Challenges and Opportunities for Thermals in Heterogeneous 3D Packaging”,
Rajiv Mongia, INTEL, USA.
Closing
17:05 - 17:30 Closing Remarks
Technical Program
(All times are given in CET time, European Time, UTC+1).
(same information but using DATE web format for information replication within the DATE'2022 Virtual Showcase)
W02.0 Workshop Introduction
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 08:45 CET - 09:45 CET
Session chair:
Mustafa Badaroglu, QUALCOMM, BE
Time | Label | Presentation Title Authors |
---|---|---|
08:45 CET | W02.0.1 | WELCOME NOTE FROM ORGANISERS Speaker: Pascal Vivet, CEA, FR |
09:00 CET | W02.0.2 | KEYNOTE: SYSTEM DESIGN TECHNOLOGY CO-OPTIMIZATION FOR 3D INTEGRATION Keynote Speaker: Seung-Chul (SC) Song, QUALCOMM, US |
W02.1 Session 1: Chiplet Partitioning and System Level Design
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 09:45 CET - 10:45 CET
Session chair:
Pascal Vivet, CEA, FR
Time | Label | Presentation Title Authors |
---|---|---|
09:45 CET | W02.1.1 | OCCAMY - A 2.5D CHIPLET SYSTEM FOR ULTRA-EFFICIENT FLOATING-POINT COMPUTING Speaker: Gianna Paulin, ETHZ, CH |
10:05 CET | W02.1.2 | CHIPLET BASED ARCHITECTURE : AN ANSWER FOR EUROPE SOVEREIGNTY IN COMPUTING ? Speaker: Denis Dutoit, CEA, FR |
10:25 CET | W02.1.3 | AUTOMOTIVE ELECTRONIC CONTROL UNIT (ECU) FOR ADAS APPLICATION BASED ON A CHIPLET APPROACH Speaker: Andy Heinig, Fraunhofer, DE |
W02.2 Session 2: 3D and Image Sensors
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 11:15 CET - 12:15 CET
Session chair:
Eric Ollier, CEA, FR
Time | Label | Presentation Title Authors |
---|---|---|
11:15 CET | W02.2.1 | EFFICIENT IMAGE FEATURE EXTRACTION EXPLOITING MASSIVE PARALLELISM THROUGH 3D-INTEGRATION Speaker: Ricardo Carmona-Galán, CSIC-University of Seville, ES |
11:35 CET | W02.2.2 | 3D INTEGRATION FOR SMART EVENT VISION SENSORS Speaker: JeanLuc Jaffard, PROPHESEE, FR |
11:55 CET | W02.2.3 | 3D-STACKED CMOS IMAGE SENSORS FOR HIGH PERFORMANCE INDIRECT TIME-OF-FLIGHT Speaker: Cedric Tubert , STMicroelectronics, FR |
W02.LB Lunch Break
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 12:15 CET - 13:30 CET
W02.3 Session 3: Ultra High Density of 3D, Monolithic 3D
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 13:30 CET - 15:15 CET
Session chair:
Saibal Mukhopadhyay, GeorgiaTech, US
Time | Label | Presentation Title Authors |
---|---|---|
13:30 CET | W02.3.1 | THIN-FILM BASED MONOLITHIC 3D SYSTEMS Speaker: Umesh Chand, NTU, SG |
13:50 CET | W02.3.2 | TEMPERATURE-AWARE MONOLITHIC 3D DNN ACCELERATORS FOR BIOMEDICAL APPLICATIONS Speaker: Prachi Shukla, Boston University, US |
14:10 CET | W02.3.3 | A COMPUTE-IN-MEMORY HARDWARE ACCELERATOR DESIGN WITH BEOL TRANSISTOR BASED RECONFIGURABLE INTERCONNECT Speaker: Shimeng Yu, GeorgiaTech, US |
14:30 CET | W02.3.4 | NANOSYSTEMS FOR ENERGY-EFFICIENT COMPUTING USING CARBON NANOTUBE FETS AND MONOLITHIC 3D INTEGRATION Speaker: Tathagata Srimani, Stanford University, US |
W02.CB Coffee Break
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 14:50 CET - 15:15 CET
W02.K KEYNOTE: 3D stacking architectures: opportunities for Augmented Reality applications
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 15:15 CET - 15:45 CET
Session chair:
Subhasish Mitra, Stanford University, US
Speaker:
Edith Beigné, META, US
W02.4 Session 4: 3D Design, Methodology and Thermal
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 15:45 CET - 17:05 CET
Session chair:
Peter Ramm, Fraunhofer, DE
Time | Label | Presentation Title Authors |
---|---|---|
15:45 CET | W02.4.3 | EDA TOOLS AND PPA TRADEOFF STUDIES FOR MICRO-BUMP AND HYBRID BOND 3D ICS Speaker: SungKyu Lim, GeorgiaTech, US |
16:05 CET | W02.4.2 | HETEROGENEOUS PACKAGING DESIGN AND VERIFICATION WORKFLOWS Speaker: Anthony Mastroianni, SIEMENS EDA, US |
16:25 CET | W02.4.4 | TOWARDS A PLACE AND ROUTE FLOW FOR HIGH DENSITY 3D-ICS Speaker: Mauricio Altieri, CEA, FR |
16:45 CET | W02.4.1 | CHALLENGES AND OPPORTUNITIES FOR THERMALS IN HETEROGENEOUS 3D PACKAGING Speaker: Rajiv Mongia, INTEL, US |
W02.C Workshop Closing Remarks
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 17:05 CET - 17:15 CET
W02.0 Workshop Introduction
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 08:45 CET - 09:45 CET
Session chair:
Mustafa Badaroglu, QUALCOMM, BE
Time | Label | Presentation Title Authors |
---|---|---|
08:45 CET | W02.0.1 | WELCOME NOTE FROM ORGANISERS Speaker: Pascal Vivet, CEA, FR |
09:00 CET | W02.0.2 | KEYNOTE: SYSTEM DESIGN TECHNOLOGY CO-OPTIMIZATION FOR 3D INTEGRATION Keynote Speaker: Seung-Chul (SC) Song, QUALCOMM, US |
15.1 Young People Program: BarCamp
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 09:00 CET - 17:30 CET
Session chair:
Anton Klotz, Cadence, DE
Session co-chair:
Georg Glaeser, Institut für Mikroelektronik- und Mechatronik-Systeme, DE
The BarCamp is an interactive open research meeting, where participants present, discuss and jointly develop ideas and results of the ongoing scientific work in a more interactive way. Characterized by an informal atmosphere, the goal of the BarCamp is to generate new and out-of-the-box ideas, and allow networking and interaction between participants.
W03.2 Projects related to Neuromorphic computing
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 09:00 CET - 10:00 CET
Time | Label | Presentation Title Authors |
---|---|---|
09:00 CET | W03.2.1 | PHOTONIC NEUROMORPHIC COMPUTING Speaker: Frank Brückerhoff-Plückelmann, University of Münster, DE |
09:30 CET | W03.2.2 | ALGORITHM-CIRCUITS-DEVICE CO-DESIGN FOR EDGE NEUROMORPHIC INTELLIGENCE – MEMSCALE PROJECT Speaker: Melika Payvand, University of Zurich and ETH Zurich, CH |
W04.1 Open Source Hardware Technologies
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 09:00 CET - 10:30 CET
Session chair:
Davide Schiavone, OpenHW Group, IT
Open Source Hardware (HW) is becoming de facto an industrial standard with an increasing population of available IPs that range from ultra-low-power edge-computing platforms to high-performance, high-end systems. Particularly, RISC-V systems arise interest among processors developers and users thanks to it simple and extendable, yet performant instruction-set architecture (ISA) among a wide range of performance requirements. In this session, alongside a strategic view on RISC-V in the industry, several IPs will be presented to show how open source is shaping the future of academic and industrial solutions. There will be showcases on open source artificial intelligence architectures as well as secure microprocessors.
Time | Label | Presentation Title Authors |
---|---|---|
09:00 CET | W04.1.1 | OPEN SOURCE HW FOR IOT AND ITS IMPACT ON THE INDUSTRIAL ECOSYSTEM: THE PULP EXPERIENCE Speaker: Davide Rossi, Univ. di Bologna, IT |
09:20 CET | W04.1.2 | RISC-V & ACCELERATORS: ENABLING VARIABLE PRECISION FP COMPUTING Presenter: Yves Durand, CEA, FR |
09:40 CET | W04.1.3 | LOWRISC’S COLLABORATIVE FRAMEWORK FOR OPEN SOURCE SILICON DESIGN Presenter: Gavin Ferris, lowRISC, GB |
10:00 CET | W04.1.4 | PANEL WITH SPEAKERS ON OPEN SOURCE HW TECHNOLOGIES. Panellists: Yves Durand1, Davide Rossi2 and Gavin Ferris3 1CEA, FR; 2Univ. di Bologna, IT; 3lowRISC, GB Moderator: Davide Schiavone, OpenHW Group, IT Abstract Q&A session with the audience and panel discussion on Open Source HW technologies. |
W05 Cross-layer algorithm & circuit design for signal processing with special emphasis on communication systems
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 09:00 CET - 14:40 CET
Organisers:
Raymond Leung, Huawei Technologies Co., Ltd., CN
Norbert Wehn, University of Kaiserslautern, DE
Leibin Ni, Huawei Technologies Co., Ltd., CN
Christian Weis, University of Kaiserslautern, DE
Speakers:
Shaodi Wang, Zhicun (WITINMEM) Technology Co. Ltd., CN
Qinhui Huang, Huawei Technologies Co., Ltd., CN
Zhihang Wu, Huawei Technologies Co., Ltd., CN
Chixiao Chen, Fudan University, CN
Zhiyi Yu, Sun-Yat Sen University, Zhuhai, CN
Ningyuan Yin, Sun-Yat Sen University, Zhuhai, CN
Kechao Huang, Huawei Technologies Co., Ltd., CN
Leibin Ni, Huawei Technologies Co., Ltd., CN
Keynote Speakers:
Andreas P. Burg, Ecole Polytechnique Federale de Lausanne (EPFL), CH
Stephan ten Brink, Institute of Telecommunications, University of Stuttgart, DE
Content/Context:
The scaling and evolving of semiconductor manufacturing technologies are triggering intense interdisciplinary and cross-layer activities; they have the potential to provide many benefits, such as a much increased energy efficiency and resilience in the context of only partially reliable hardware circuit designs.
Signal processing, particularly in the field of communication systems can largely benefit from these developments due to 1) rapidly increasing energy efficiency requirements as a consequence of the demand for higher data rates and 2) an inherent fault tolerance of the underlying signal processing algorithms. In the context of a communication system, the reliability and robustness requirements can largely vary depending on the considered target application: while they are rather relaxed for wireless communication systems due to a high overall acceptable error rate in the outcome of the processing, they are rather stringent in the field of optical communications that necessitates an operation at very low error rates. These specific characteristics and requirements foster cross-layer approaches to consider jointly the algorithm and the hardware circuit design.
The robustness of the used hardware technology and processing architecture (classical signal processing, ML computing, or In-memory processing) together with the resilience of the applied algorithm determine the configuration and parameter of the complete application and as well the chosen algorithm. The further scaling of technology nodes and the slow-down of Moore’s law may force to deeply revisit existing and future signal processing and communication systems, and that will increase the potential need for paradigm changes in the design of such systems.
This workshop aims at providing a forum to discuss challenges, trends, solutions and applications of these rapidly evolving cross-layer approaches for the algorithm and circuit design of communication and signal processing systems by gathering researchers and engineers from academia and industry; it also aims at creating a unique network of competence and experts in all aspects of cross-layer solutions and technologies including manufacturing reliability, architectures, design, algorithms, automation and test. The workshop will therefore give an opportunity for the contributors to share/discuss the state-of-the-art knowledge and their work in progress.
The topics that will be discussed in this workshop include but are not limited to:
- Cross-layer design approaches
- Approximate computing in signal processing and communication systems
- Algorithm design for communication systems and optimization for hardware implementation
- ML and CNN computing for communication and signal processing systems
- In-memory computing approaches for signal processing and communication systems
Keynote speakers:
- Professor Andreas P. Burg, Telecommunications Circuits Laboratory, Ecole Polytechnique Federale de Lausanne (EPFL)
- Professor Stephan ten Brink, Director Institute of Telecommunications (INÜ), University of Stuttgart
This workshop is supported by TU Kaiserslautern, Department of Electrical and Computer Engineering, Division of Microelectronic Systems Design
Participants can register for the workshop free of charge via the online registration platform.
TECHNICAL PROGRAM
W05.1 Welcome Address
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 09:00 CET - 09:10 CET
Speaker:
Norbert Wehn, TU Kaiserslautern, DE
W05.2 Keynote I and Invited Talk
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 09:10 CET - 10:30 CET
Session chair:
Norbert Wehn, TU Kaiserslautern, DE
Time | Label | Presentation Title Authors |
---|---|---|
09:10 CET | W05.2.1 | KEYNOTE: "ON THE CURSE AND THE BEAUTY OF RANDOMNESS FOR PROVIDING RELIABLE QUALITY GUARANTEES WITH UNRELIABLE SILICON" Keynote Speaker: Andreas P. Burg, Ecole Polytechnique Federale de Lausanne (EPFL), CH Abstract Abstract: Silicon implementations of complex algorithms (for communications and other applications) are burdened by extensive savety margins to ensure 100% reliable operation. These margins limit voltage scaling at the cost of energy/power consumption and require conservative layout rules such as double-fins or the use of static memories for storage that are costly in area. "Approximate computing" or "computing on unreliable silicon" promotes the idea to compromise reliability and tolerate occasional errors or parameter variations for the benefit of area and power/energy. This idea is especially relevant for applications such as communications or machine learning, where systems are anyway tolerant to noise or apply only stochastic quality metrics such as BER, FER, PSNR, or MSE. Bio: Andreas Burg (S'97-M'05) was born in Munich, Germany, in 1975. He received his Dipl.-Ing. degree from the Swiss Federal Institute of Technology (ETH) Zurich, Switzerland, in 2000, and the Dr. sc. techn. degree from the Integrated Systems Laboratory of ETH Zurich, in 2006. |
10:00 CET | W05.2.2 | INVITED TALK: "IMPLEMENTATION OF MULTI-HUNDRED-GIGABIT THROUGHPUT OPTICAL FEC CODEC WITH NON-REFRESH EDRAM" Speakers: Qinhui Huang and Kechao Huang, Huawei Technologies Co., Ltd., CN Abstract Abstract: Forward-error-correction codes (FECs) are essential elements in the field of optical communication to deliver an ultra-reliable transmission. In the last decade, communication engineers are unprecedentedly eager to power-efficient FECs due to the scaling down of Moore’s law and the increasing demand of data rate. Typically, the area and power consumption of nowadays high speed FECs are dominated by memories. Embedded DRAM (eDRAM) is a promising approach to deal with this issue due to the fewer transistors. Algorithm and circuit can be co-designed, and refresh module can be removed in such a domain-specific eDRAM. By using non-refresh eDRAM instead of conventional SRAM, significant power reduction and area saving can be achieved in high-speed FECs. |
W05.3 Coffee Break
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 10:30 CET - 10:45 CET
W05.4 Keynote II and Invited Talk
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 10:45 CET - 12:00 CET
Session chair:
Christian Weis, University of Kaiserslautern, DE
Time | Label | Presentation Title Authors |
---|---|---|
10:45 CET | W05.4.1 | KEYNOTE: "DEEP LEARNING APPLICATIONS IN WIRELESS COMMUNICATIONS BASED ON DISTRIBUTED MASSIVE MIMO CHANNEL SOUNDING DATA" Keynote Speaker: Stephan ten Brink, Institute of Telecommunications, University of Stuttgart, DE Abstract Abstract: A distributed massive MIMO channel sounder for acquiring CSI datasets is presented. The measured data has several applications in the study of different machine learning algorithms. Each individual single-antenna receiver is completely autonomous, enabling arbitrary grouping into spatially distributed antenna deployments, and offering virtually unlimited scalability in the number of antennas. Some of the deep learning applications presented include absolute and relative user localization like “channel charting”, and CSI inference for UL/DL FDD massive MIMO operation. Bio: Stephan ten Brink has been a faculty member at the University of Stuttgart, Germany, since July 2013, where he is head of the Institute of Telecommunications. From 1995 to 1997 and 2000 to 2003, Dr. ten Brink was with Bell Laboratories in Holmdel, New Jersey, conducting research on multiple antenna systems. From July 2003 to March 2010, he was with Realtek Semiconductor Corp., Irvine, California, as Director of the wireless ASIC department, developing WLAN and UWB single chip MAC/PHY CMOS solutions. In April 2010 he returned to Bell Laboratories as Department Head of the Wireless Physical Layer Research Department in Stuttgart, Germany. Dr. ten Brink is an IEEE Fellow, and recipient and co-recipient of several awards, including the Vodafone Innovation Award, the IEEE Stephen O. Rice Paper Prize, and the IEEE Communications Society Leonard G. Abraham Prize for contributions to channel coding and signal detection for multiple-antenna systems. He is best known for his work on iterative decoding (EXIT charts), MIMO communications (soft sphere detection, massive MIMO), and deep learning applied to communications. |
11:30 CET | W05.4.2 | INVITED TALK: "COMMUNICATION-AWARE CROSS-LAYER CODESIGN STRATEGY FOR ENERGY EFFICIENT MACHINE LEARNING SOC" Speaker: Chixiao Chen, Fudan University, CN Abstract Abstract: As the great success of artificial intelligence algorithms, machine learning SoC are becoming a significant type of high performance processors recently. However, the limited power budget of edge devices cannot support GPUs and intensive DRAM access. The talk will discuss multiple energy efficient codesign examples to avoid power hungry hardware. First, on-chip incremental learning is performed on an SoC without dedicated backpropagation computing, where algorithm-architecture codesign is involved. Second, low bit-width quantization schemes are applied to computing-in-memory based SoC, where algorithm-circuit codesign is investigated. Moreover, data flow optimization is mapped onto a multi-chiplet-module system, where architecture-package codesign is discussed. |
W05.5 Lunch Break
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 12:00 CET - 13:00 CET
W05.6 Invited Talks: From Pass-Transistor-Logic to Computing-In-Memory
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 13:00 CET - 14:30 CET
Session chairs:
Leibin Ni, Huawei Technologies Co., Ltd., CN
Christian Weis, University of Kaiserslautern, DE
Time | Label | Presentation Title Authors |
---|---|---|
13:00 CET | W05.6.1 | INVITED TALK I: "RESEARCH AND DESIGN OF PASS TRANSISTOR BASED MULTIPLIERS AND THEIR DESIGN FOR TEST FOR CONVOLUTIONAL NEURAL NETWORK COMPUTATION" Speakers: Zhiyi Yu and Ningyuan Yin, Sun-Yat Sen University, Zhuhai, CN Abstract Abstract: Convolutional Neural Networks (CNN) are featured with different bit widths at different layers and have been widely used in mobile and embedded applications. The implementation of a CNN may include multipliers which might consume large overheads and suffer from a high timing error rate due to their large delay. The Pass transistor logic (PTL) based multiplier is a promising solution to such issues. It uses less transistors. It also reduces the gates in the critical path and thus reduces the worst case delay. As a result, the timing error rate is reduced. In this talk, we present PTL based multipliers and the design for test (DFT). An error model is built to analyze the error rate and to help with DFT. According to the simulation results, compared to traditional CMOS based multiplier, the operation ability (measured by Joule per operation, J/OPS) of PTL multipliers could be reduced by over 20%. |
13:30 CET | W05.6.2 | INVITED TALK II: "WTM2101: COMPUTING-IN-MEMORY SOC" Speaker: Shaodi Wang, Zhicun (WITINMEM) Technology Co. Ltd., CN Abstract Abstract: In this talk, we will introduce an ultra-low-power neural processing SoC chip with computing-in-memory technology. We have designed, fabricated, and tested chips based on nonvolatile floating-gate technology nodes. It simultaneously solves the data processing and communication bottlenecks in NNs. Furthermore, thanks to the nonvolatility of the floating-gate cell, the computing-in-memory macros can be powered down during the idle state, which saves leakage power for an IoT uses, e.g., for voice commands recognition. The chip supports multiple NNs including DNN, TDNN, and RNN for different applications. |
14:00 CET | W05.6.3 | INVITED TALK III: "IMPLEMENTATION AND PERFORMANCE ANALYSIS OF COMPUTING-IN-MEMORY TOWARDS COMMUNICATION SYSTEMS" Speakers: Zhihang Wu and Leibin Ni, Huawei Technologies Co., Ltd., CN Abstract Abstract: Computing-in-memory (CIM) is an emerging technique to solve the memory-wall bottleneck. It can reduce data movement between memory and processor, and have significant power reduction in neural network accelerators, especially in edge devices. Communication system is facing the power issue and heat dissipation problem while implementaing the DSP algorithm with ASIC. It will have a great impact if CIM technique can be applied in communication systems to improve the energy efficiency. The talk will discuss computing-in-memory technique for communication systems. Some DSP modules (such as FIR, MIMO and FECs) will be re-organized and mapped onto computing-in-memory units as examples. |
W05.7 Closing Notes
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 14:30 CET - 14:40 CET
Speakers:
Christian Weis, University of Kaiserslautern, DE
Leibin Ni, Huawei Technologies Co., Ltd., CN
W05.1 Welcome Address
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 09:00 CET - 09:10 CET
Speaker:
Norbert Wehn, TU Kaiserslautern, DE
W09.0 Openings and Welcome
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 09:00 CET - 09:15 CET
Organisers:
Francesco Regazzoni, University of Amsterdam, NL and Università della Svizzera italiana, CH
Elif Bilge Kavun, University of Passau, DE
W05.2 Keynote I and Invited Talk
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 09:10 CET - 10:30 CET
Session chair:
Norbert Wehn, TU Kaiserslautern, DE
Time | Label | Presentation Title Authors |
---|---|---|
09:10 CET | W05.2.1 | KEYNOTE: "ON THE CURSE AND THE BEAUTY OF RANDOMNESS FOR PROVIDING RELIABLE QUALITY GUARANTEES WITH UNRELIABLE SILICON" Keynote Speaker: Andreas P. Burg, Ecole Polytechnique Federale de Lausanne (EPFL), CH Abstract Abstract: Silicon implementations of complex algorithms (for communications and other applications) are burdened by extensive savety margins to ensure 100% reliable operation. These margins limit voltage scaling at the cost of energy/power consumption and require conservative layout rules such as double-fins or the use of static memories for storage that are costly in area. "Approximate computing" or "computing on unreliable silicon" promotes the idea to compromise reliability and tolerate occasional errors or parameter variations for the benefit of area and power/energy. This idea is especially relevant for applications such as communications or machine learning, where systems are anyway tolerant to noise or apply only stochastic quality metrics such as BER, FER, PSNR, or MSE. Bio: Andreas Burg (S'97-M'05) was born in Munich, Germany, in 1975. He received his Dipl.-Ing. degree from the Swiss Federal Institute of Technology (ETH) Zurich, Switzerland, in 2000, and the Dr. sc. techn. degree from the Integrated Systems Laboratory of ETH Zurich, in 2006. |
10:00 CET | W05.2.2 | INVITED TALK: "IMPLEMENTATION OF MULTI-HUNDRED-GIGABIT THROUGHPUT OPTICAL FEC CODEC WITH NON-REFRESH EDRAM" Speakers: Qinhui Huang and Kechao Huang, Huawei Technologies Co., Ltd., CN Abstract Abstract: Forward-error-correction codes (FECs) are essential elements in the field of optical communication to deliver an ultra-reliable transmission. In the last decade, communication engineers are unprecedentedly eager to power-efficient FECs due to the scaling down of Moore’s law and the increasing demand of data rate. Typically, the area and power consumption of nowadays high speed FECs are dominated by memories. Embedded DRAM (eDRAM) is a promising approach to deal with this issue due to the fewer transistors. Algorithm and circuit can be co-designed, and refresh module can be removed in such a domain-specific eDRAM. By using non-refresh eDRAM instead of conventional SRAM, significant power reduction and area saving can be achieved in high-speed FECs. |
W09.1 Keynote 1
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 09:15 CET - 10:00 CET
Session chair:
Subhadeep Banik, Università della Svizzera italiana, CH
Time | Label | Presentation Title Authors |
---|---|---|
09:15 CET | W09.1.1 | NANOELECTRONICS: AN ENABLER FOR SUSTAINABLE SECURITY Keynote Speaker: Ilia Polian, University of Stuttgart, DE |
W10.S2 Human-Machine Systems
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 09:30 CET - 10:45 CET
Session chair:
Chung-Wei Lin, National Taiwan University, TW
Speaker & Panellists:
Meike Jipp, German Aerospace Center (DLR), DE
Lv Chen, Nanyang Technological University, SG
Shunsuke Aoki, National Institute of Informatics, JP
The technology of autonomous vehicles continues advancing, but there will be a long time before human-driven vehicles are completely replaced by fully-autonomous vehicles. Therefore, mixed traffic environments need to be taken care of to provide safe, efficient, and comfortable transportation. In this session, the experts will discuss the roles of autonomous vehicles, human-driven vehicles, and roadside units and share their visions on human-machine systems. The ultimate goal is to support a smooth transition from current transportation to autonomous transportation.
Agenda of Talks and Speakers/Panelists:
-
09:30 - 09:45: "Towards Cooperative Autonomous Vehicles for Mixed Traffic Environments", Shunsuke Aoki, National Institute of Informatics, JP
- 09:45 - 10:00: Meike Jipp, German Aerospace Center (DLR), DE
-
10:00 - 10:15: "Human-Like Autonomous Driving and Human-Machine Systems", Chen Lv, Nanyang Technological University, SG
- 10:15 - 10:45: Interactive panel discussion on Human-Machine Systems
W02.1 Session 1: Chiplet Partitioning and System Level Design
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 09:45 CET - 10:45 CET
Session chair:
Pascal Vivet, CEA, FR
Time | Label | Presentation Title Authors |
---|---|---|
09:45 CET | W02.1.1 | OCCAMY - A 2.5D CHIPLET SYSTEM FOR ULTRA-EFFICIENT FLOATING-POINT COMPUTING Speaker: Gianna Paulin, ETHZ, CH |
10:05 CET | W02.1.2 | CHIPLET BASED ARCHITECTURE : AN ANSWER FOR EUROPE SOVEREIGNTY IN COMPUTING ? Speaker: Denis Dutoit, CEA, FR |
10:25 CET | W02.1.3 | AUTOMOTIVE ELECTRONIC CONTROL UNIT (ECU) FOR ADAS APPLICATION BASED ON A CHIPLET APPROACH Speaker: Andy Heinig, Fraunhofer, DE |
W01.T1 Technical Session 1 - Applications, Machine Learning, and System-level Test
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 10:00 CET - 11:00 CET
Time | Label | Presentation Title Authors |
---|---|---|
10:00 CET | W01.T1.1 | TOWARDS FAST AND EFFICIENT SCENARIO GENERATION FOR AUTONOMOUS VEHICLES Speaker: Haolan Liu, University of California, US |
10:15 CET | W01.T1.2 | DEEP LEARNING BASED DRIVER MODEL AND FAULT DETECTION FOR AUTOMATED RACECAR SYSTEM TESTING Speaker: Yousef Abdulhammed, BMW Motorsport, DE |
10:30 CET | W01.T1.3 | UNSUPERVISED CLUSTERING OF ACOUSTIC EMISSION SIGNALS FOR SEMICONDUCTOR THIN LAYER CRACK DETECTION AND DAMAGE EVENT INTERPRETATION Speaker: Sarah Seifi, Infineon Technologies AG, DE |
10:45 CET | W01.T1.4 | ONLINE SCHEDULING OF MEMORY BISTS EXECUTION AT REAL-TIME OPERATING-SYSTEM LEVEL Speaker: Paolo Bernardi, Politecnico di Torino, IT |
W09.2 Session 1: Lightweight Physical Primitives for Security
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 10:15 CET - 11:15 CET
Session chair:
Apostolos Fournaris, ISI, GR
Time | Label | Presentation Title Authors |
---|---|---|
10:15 CET | W09.2.1 | ON THE SUSTAINABILITY OF LIGHTWEIGHT CRYPTOGRAPHY BASED ON FLASH PUF Speaker: Nikolaos Athanasios Anagnostopoulos, University of Passau & TU Darmstadt, DE |
10:35 CET | W09.2.2 | A ONE-TIME PUF PROTOCOL FOR IMPROVING SUSTAINABILITY AND SECURITY FOR HARDWARE-ROOTED LIGHTWEIGHT SECURITY SYSTEMS Speaker: Owen Millwood, University of Sheffield, GB |
10:55 CET | W09.2.3 | PHYSICALLY RELATED FUNCTIONS: A NEW HARDWARE SECURITY PRIMITIVE FOR LIGHTWEIGHT CRYPTOGRAPHIC PROTOCOLS Speaker: Harishma Boyapally, Indian Institute of Technology Kharagpur, India, IN |
W03.3 Materials and Devices
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 10:30 CET - 12:30 CET
Time | Label | Presentation Title Authors |
---|---|---|
10:30 CET | W03.3.1 | MODELING UNCONVENTIONAL NANOSCALED DEVICE FABRICATION – MUNDFAB PROJECT Speaker: Peter Pichler, Fraunhofer IISB, DE |
11:10 CET | W03.3.2 | RESISTANCE SWITCHING MATERIALS AND DEVICES FOR NEUROMORPHIC COMPUTING Speaker: Sabina Spiga, CNR - IMM, IT |
11:50 CET | W03.3.3 | OSCILLATING NEURAL NETWORKS POWERED BY PHASE-TRANSITION VO2 NANODEVICES Speaker: Oliver Maher, IBM, CH |
W05.3 Coffee Break
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 10:30 CET - 10:45 CET
W05.4 Keynote II and Invited Talk
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 10:45 CET - 12:00 CET
Session chair:
Christian Weis, University of Kaiserslautern, DE
Time | Label | Presentation Title Authors |
---|---|---|
10:45 CET | W05.4.1 | KEYNOTE: "DEEP LEARNING APPLICATIONS IN WIRELESS COMMUNICATIONS BASED ON DISTRIBUTED MASSIVE MIMO CHANNEL SOUNDING DATA" Keynote Speaker: Stephan ten Brink, Institute of Telecommunications, University of Stuttgart, DE Abstract Abstract: A distributed massive MIMO channel sounder for acquiring CSI datasets is presented. The measured data has several applications in the study of different machine learning algorithms. Each individual single-antenna receiver is completely autonomous, enabling arbitrary grouping into spatially distributed antenna deployments, and offering virtually unlimited scalability in the number of antennas. Some of the deep learning applications presented include absolute and relative user localization like “channel charting”, and CSI inference for UL/DL FDD massive MIMO operation. Bio: Stephan ten Brink has been a faculty member at the University of Stuttgart, Germany, since July 2013, where he is head of the Institute of Telecommunications. From 1995 to 1997 and 2000 to 2003, Dr. ten Brink was with Bell Laboratories in Holmdel, New Jersey, conducting research on multiple antenna systems. From July 2003 to March 2010, he was with Realtek Semiconductor Corp., Irvine, California, as Director of the wireless ASIC department, developing WLAN and UWB single chip MAC/PHY CMOS solutions. In April 2010 he returned to Bell Laboratories as Department Head of the Wireless Physical Layer Research Department in Stuttgart, Germany. Dr. ten Brink is an IEEE Fellow, and recipient and co-recipient of several awards, including the Vodafone Innovation Award, the IEEE Stephen O. Rice Paper Prize, and the IEEE Communications Society Leonard G. Abraham Prize for contributions to channel coding and signal detection for multiple-antenna systems. He is best known for his work on iterative decoding (EXIT charts), MIMO communications (soft sphere detection, massive MIMO), and deep learning applied to communications. |
11:30 CET | W05.4.2 | INVITED TALK: "COMMUNICATION-AWARE CROSS-LAYER CODESIGN STRATEGY FOR ENERGY EFFICIENT MACHINE LEARNING SOC" Speaker: Chixiao Chen, Fudan University, CN Abstract Abstract: As the great success of artificial intelligence algorithms, machine learning SoC are becoming a significant type of high performance processors recently. However, the limited power budget of edge devices cannot support GPUs and intensive DRAM access. The talk will discuss multiple energy efficient codesign examples to avoid power hungry hardware. First, on-chip incremental learning is performed on an SoC without dedicated backpropagation computing, where algorithm-architecture codesign is involved. Second, low bit-width quantization schemes are applied to computing-in-memory based SoC, where algorithm-circuit codesign is investigated. Moreover, data flow optimization is mapped onto a multi-chiplet-module system, where architecture-package codesign is discussed. |
W10.B1 Break
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 10:45 CET - 11:00 CET
W01.2 Invited Session 1: Design-for-dependability for AI hardware accelerators in the edge
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 11:00 CET - 12:00 CET
Organiser:
Haralampos-G. Stratigopoulos, Sorbonne Universités, CNRS, LIP6, FR
Abstract: AI has seen an explosion in real-world applications in the recent years. For example, it is the backbone of self-driving and connected cars. The design of AI hardware accelerators to support the intensive and memory-hungry AI workloads is an on-going effort aiming at optimizing the energy-area trade-off. This special session will focus on dependability aspects in the design of AI hardware accelerators. It is often tacitly assumed that neural networks on hardware inherit the remarkable fault tolerance capabilities of the biological brain. This assumption has proven to be false in recent years by a number of fault injection experiments. The three talks will cover reliability assessment and fault tolerance of Artificial Neural Networks and Spiking Neural Networks implemented in hardware, as well as the impact of approximate computing on the fault tolerance capabilities.
Presentations:
- Fault Tolerance of Neural Network Hardware Accelerators for Autonomous Driving
Adrian Evans (CEA-Leti, Grenoble, France), Lorena Anghel (Grenoble-INP, SPINTEC, Grenoble, France), and Stéphane Burel (CEA-Leti, Grenoble, France)
- Exploiting Approximate Computing for Efficient and Reliable Convolutional Neural Networks
Alberto Bosio (École Centrale de Lyon, INL, Lyon, France)
- Reliability Assessment and Fault Tolerance of Spiking Neural Network Hardware Accelerators
Haralampos-G. Stratigopoulos (Sorbonne University, CNRS, LIP6, Paris, France)
Time | Label | Presentation Title Authors |
---|---|---|
11:00 CET | W01.2.1 | FAULT TOLERANCE OF NEURAL NETWORK HARDWARE ACCELERATORS FOR AUTONOMOUS DRIVING Speaker: Adrian Evans, CEA-Leti, FR |
11:20 CET | W01.2.2 | EXPLOITING APPROXIMATE COMPUTING FOR EFFICIENT AND RELIABLE CONVOLUTIONAL NEURAL NETWORKS Speaker: Alberto Bosio, École Centrale de Lyon, FR |
11:40 CET | W01.2.3 | RELIABILITY ASSESSMENT AND FAULT TOLERANCE OF SPIKING NEURAL NETWORK HARDWARE ACCELERATORS Speaker: Haralampos-G. Stratigopoulos, Sorbonne Universités, CNRS, LIP6, FR |
W04.2 Open Source Software & Support Technologies for Open Source Hardware
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 11:00 CET - 12:30 CET
Session chair:
Jérôme Quévremont, Thales, FR
Besides IP blocks addressed in the previous session, open source hardware requires several software technologies that benefit from open source collaboration. First of all software development tools, such as compilers, debugger, bootloaders, operating system, runtime frameworks, etc. Then design tools like FPGA compilers, CAD tools, simulators, emulators, validation tools, etc. Finally yet importantly, security benefits from open source at every level of the hardware and software stack. This session will report on several contributions these support technologies, and discuss the main upcoming issues of this field.
Time | Label | Presentation Title Authors |
---|---|---|
11:00 CET | W04.2.1 | SUPPORTING OPEN SOURCE HARDWARE TO ENABLE COMMERCIAL ADOPTION Presenter: Michael Gielda, Antmicro, PL |
11:15 CET | W04.2.2 | ADDING SUPPORT FOR RISC-V “V” VECTOR EXTENSION IN LLVM. Presenter: Roger Ferrer Ibanez, BSC, ES |
11:30 CET | W04.2.3 | IMPROVING SECURITY THROUGH OPEN SOURCE HARDWARE Presenter: Stefan Mangard, TU Graz, AT |
11:45 CET | W04.2.4 | SIMULATION OF RISC-V 128-BIT EXTENSION IN QEMU Presenter: Frédéric Pétrot, Grenoble INP/TIMA, FR |
12:00 CET | W04.2.5 | PANEL WITH THE SPEAKERS ON OPEN SOFTWARE & SUPPORT TECHNOLOGIES FOR OPEN SOURCE HARDWARE Panellists: Michael Gielda1, Roger Ferrer Ibanez2, Stefan Mangard3 and Frédéric Pétrot4 1Antmicro, PL; 2BSC, ES; 3TU Graz, AT; 4Grenoble INP/TIMA, FR Moderator: Jérôme Quévremont, Thales, FR Abstract Q&A session with the audience and panel discussion on Open Software & Support Technologies for Open Source Hardware. |
W10.S3 Hardware and Components
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 11:00 CET - 12:15 CET
Session chair:
Sebastian Steinhorst, TU Munich, DE
Speaker & Panellists:
Bart Vermeulen, NXP, NL
Hai (Helen) Li, Duke University, US
Luca Benini, ETH Zurich, CH
For the design of autonomous systems, powerful hardware and system components are as much a core enabler of advanced autonomy as the software running on them. With the increase in cognitive capabilities by integration of computation and sensing platforms, many opportunities as well as challenges arise, such that both hardware and AI-centric software can operate fully synergetic and, hence, reach their full potential. The purpose of this session is to discuss the latest trends in hardware components and their design aspects for efficient and holistic integration of computation, sensing and communication.
Agenda of Talks and Speakers/Panelists:
- 11:00 - 11:15: "PULP: An Open Ultra Low-Power Platform for Autonomous Nano Drones", Luca Benini, ETH Zürich, CH
- 11:15 - 11:30: "Challenges & Solutions for Next-Generation E/E Architectures", Bart Vermeulen, NXP, NL
- 11:30 - 11:45: "Efficient Machine Learning: Algorithms-Circuits-Devices Co-design", Hai (Helen) Li, Duke University, US
- 11:45 - 12:15: Interactive panel discussion on Hardware and Components
W02.2 Session 2: 3D and Image Sensors
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 11:15 CET - 12:15 CET
Session chair:
Eric Ollier, CEA, FR
Time | Label | Presentation Title Authors |
---|---|---|
11:15 CET | W02.2.1 | EFFICIENT IMAGE FEATURE EXTRACTION EXPLOITING MASSIVE PARALLELISM THROUGH 3D-INTEGRATION Speaker: Ricardo Carmona-Galán, CSIC-University of Seville, ES |
11:35 CET | W02.2.2 | 3D INTEGRATION FOR SMART EVENT VISION SENSORS Speaker: JeanLuc Jaffard, PROPHESEE, FR |
11:55 CET | W02.2.3 | 3D-STACKED CMOS IMAGE SENSORS FOR HIGH PERFORMANCE INDIRECT TIME-OF-FLIGHT Speaker: Cedric Tubert , STMicroelectronics, FR |
W09.3 Keynote 2
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 11:30 CET - 12:30 CET
Session chair:
Francesco Regazzoni, University of Amsterdam, NL and Università della Svizzera italiana, CH
Time | Label | Presentation Title Authors |
---|---|---|
11:30 CET | W09.3.1 | SUSTAINABLE SECURITY: WHAT DO WE SUSTAIN, AND FOR WHOM? Keynote Speaker: Jan Tobias Muehlberg, IMEC-DistriNet, KU Leuven, BE |
W01.T2 Technical Session 2 - Testing
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 12:00 CET - 13:00 CET
Session Chair:
Melanie Schillinsky, NXP, DE
Time | Label | Presentation Title Authors |
---|---|---|
12:00 CET | W01.T2.1 | PERCEPTION AND REALITY CHECK INTO V-STRESS FOR SCREENING DEFECTIVE PARTS IN AUTOMOTIVE RELIABILITY Speaker: Lieyi Sheng, onsemi, BE |
12:15 CET | W01.T2.2 | POWER CYCLING BODY DIODE CURRENT FLOW ON SIC MOSFET DEVICE Speaker: Giovanni Corrente, STMicroelectronics, IT |
12:30 CET | W01.T2.3 | REDUCING ROUTING OVERHEAD USING NATURAL LOOPS Speaker: Tobias Kilian, TU Munich / Infineon Technologies AG, DE |
12:45 CET | W01.T2.4 | A NOVEL METHOD FOR DISCOVERING ELECTRICALLY EQUIVALENT DEFECTS IN ANALOG/MIXED-SIGNAL CIRCUITS Speaker: Mayukh Bhattacharya , Synopsys, US |
W05.5 Lunch Break
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 12:00 CET - 13:00 CET
W02.LB Lunch Break
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 12:15 CET - 13:30 CET
W10.B2 Lunch Break
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 12:15 CET - 13:30 CET
W03.LB Lunch Break
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 12:30 CET - 13:30 CET
W05.6 Invited Talks: From Pass-Transistor-Logic to Computing-In-Memory
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 13:00 CET - 14:30 CET
Session chairs:
Leibin Ni, Huawei Technologies Co., Ltd., CN
Christian Weis, University of Kaiserslautern, DE
Time | Label | Presentation Title Authors |
---|---|---|
13:00 CET | W05.6.1 | INVITED TALK I: "RESEARCH AND DESIGN OF PASS TRANSISTOR BASED MULTIPLIERS AND THEIR DESIGN FOR TEST FOR CONVOLUTIONAL NEURAL NETWORK COMPUTATION" Speakers: Zhiyi Yu and Ningyuan Yin, Sun-Yat Sen University, Zhuhai, CN Abstract Abstract: Convolutional Neural Networks (CNN) are featured with different bit widths at different layers and have been widely used in mobile and embedded applications. The implementation of a CNN may include multipliers which might consume large overheads and suffer from a high timing error rate due to their large delay. The Pass transistor logic (PTL) based multiplier is a promising solution to such issues. It uses less transistors. It also reduces the gates in the critical path and thus reduces the worst case delay. As a result, the timing error rate is reduced. In this talk, we present PTL based multipliers and the design for test (DFT). An error model is built to analyze the error rate and to help with DFT. According to the simulation results, compared to traditional CMOS based multiplier, the operation ability (measured by Joule per operation, J/OPS) of PTL multipliers could be reduced by over 20%. |
13:30 CET | W05.6.2 | INVITED TALK II: "WTM2101: COMPUTING-IN-MEMORY SOC" Speaker: Shaodi Wang, Zhicun (WITINMEM) Technology Co. Ltd., CN Abstract Abstract: In this talk, we will introduce an ultra-low-power neural processing SoC chip with computing-in-memory technology. We have designed, fabricated, and tested chips based on nonvolatile floating-gate technology nodes. It simultaneously solves the data processing and communication bottlenecks in NNs. Furthermore, thanks to the nonvolatility of the floating-gate cell, the computing-in-memory macros can be powered down during the idle state, which saves leakage power for an IoT uses, e.g., for voice commands recognition. The chip supports multiple NNs including DNN, TDNN, and RNN for different applications. |
14:00 CET | W05.6.3 | INVITED TALK III: "IMPLEMENTATION AND PERFORMANCE ANALYSIS OF COMPUTING-IN-MEMORY TOWARDS COMMUNICATION SYSTEMS" Speakers: Zhihang Wu and Leibin Ni, Huawei Technologies Co., Ltd., CN Abstract Abstract: Computing-in-memory (CIM) is an emerging technique to solve the memory-wall bottleneck. It can reduce data movement between memory and processor, and have significant power reduction in neural network accelerators, especially in edge devices. Communication system is facing the power issue and heat dissipation problem while implementaing the DSP algorithm with ASIC. It will have a great impact if CIM technique can be applied in communication systems to improve the energy efficiency. The talk will discuss computing-in-memory technique for communication systems. Some DSP modules (such as FIR, MIMO and FECs) will be re-organized and mapped onto computing-in-memory units as examples. |
W06 Data-driven applications for industrial and societal challenges: Problems, methods, and computing platforms
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 13:00 CET - 18:00 CET
Organisers:
Jeronimo Castrillon, TU Dresden, DE
Christoph Hagleitner, IBM Research -- Zurich Research Laboratory, CH
Jeronimo Castrillon, TU Dresden, DE
Christian Pilato, Politecnico di Milano, IT
Christian Pilato, Politecnico di Milano, IT
Christoph Hagleitner, IBM Research -- Zurich Research Laboratory, CH
Speakers:
Diana Göhringer, TU Dresden, DE
Hannes Vogt, ETH Zurich / CSCS, CH
Tobias Grosser, University of Edinburgh, GB
Nuria de Lama, Consulting Director, EU Government Consulting IDC, ES
Gianluca Palermo, Politecnico di Milano - DEIB, IT
Kentaro Sano, Center for Computational Science, RIKEN, JP
Luca Carloni, Columbia University , US
Zhiru Zhang, Cornell University, US
This workshop is supported by
TU Dresden, Chair for Compiler Construction
Participants can register for the workshop free of charge via the online registration platform.
W06.1 Big data, HPC and FPGAs
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 13:00 CET - 15:15 CET
Session chair:
Fabrizio Ferrandi, Politecnico di Milano, IT
This session gives an overview of the challenges of modern big-data applications, reports on state-of-the art FPGA-based HPC systems, describes an open-source reconfigurable hardware ecosystem and design flow, and closes with recent research on hardware acceleration for sparse big-data workloads.
Time | Label | Presentation Title Authors |
---|---|---|
13:00 CET | W06.1.1 | WORKSHOP INTRODUCTION Organisers: Jeronimo Castrillon1, Christoph Hagleitner2 and Christian Pilato3 1TU Dresden, DE; 2IBM Research -- Zurich Research Laboratory, CH; 3Politecnico di Milano, IT Abstract Brief introduction by the organizers on the motivation for as well as the format and the contents of the DATA-DREAM workshop in DATE 2022. |
13:15 CET | W06.1.2 | EVOLUTION OF THE DATA MARKET: HIGHLIGHTS AND PROJECTIONS Speaker: Nuria de Lama, Consulting Director, EU Government Consulting IDC, ES Abstract Taking decisions in the context of the data economy requires a deep understanding of the market and its projections. Investment in technologies should consider the value of indicators associated to the potential growth of the market, competitors, size of the ecosystem or a view on the skills gap. This presentation will offer updated figures elaborated by the Data Market Study run by IDC for 2021-2023 and will position the figures in a set of potential scenarios that will define the performance of the EU in a data-driven economy. Attendees will learn about the value of indicators such as data professionals and the skills gap, data companies, data suppliers, data economy, the value of the data market or the international dimension bringing some knowledge on markets outside the EU (US, Brazil, Japan, China). |
13:45 CET | W06.1.3 | SYSTEM AND APPLICATIONS OF FPGA CLUSTER "ESSPER" FOR RESEARCH ON RECONFIGURABLE HPC Speaker: Kentaro Sano, Center for Computational Science, RIKEN, JP Abstract At RIKEN Center for Computational Science (R-CCS), we have been developing an experimental FPGA Cluster named "ESSPER (Elastic and Scalable System for high-PErformance Reconfigurable computing)," which is a research platform for reconfigurable HPC. ESSPER is composed of sixteen Intel Stratix 10 SX FPGAs which are connected to each other by a dedicated 100Gbps inter-FPGA network. We have developed our own Shell (SoC) and its software APIs for the FPGAs supporting inter-FPGA communication. The FPGA host servers are connected to a 100Gbps Infiniband switch, which allows distant servers to remotely access the FPGAs by using a software bridged Intel's OPAE FPGA driver, called R-OPAE. By 100Gbps Infiniband network and R-OPAE, ESSPER is actually connected to the world's fastest supercomputer, Fugaku, deployed in RIKEN, so that using Fugaku we can program bitstreams onto FPGAs remotely using R-OPAE, and off-load tasks to the FPGAs. In this talk, I introduce our ESSPER's concept, system stack of hardware and software, programming environment, under-development applications as well as our future prospects for reconfigurable HPC. |
14:15 CET | W06.1.4 | OPEN-SOURCE HARDWARE FOR HETEROGENEOUS COMPUTING Speaker: Luca Carloni, Columbia University , US Abstract Information technology has entered the age of heterogeneous computing. Across a variety of application domains, computer systems rely on highly heterogeneous architectures that combine multiple general-purpose processors with many specialized hardware accelerators. The complexity of these systems, however, threatens to widen the gap between the capabilities provided by semiconductor technologies and the productivity of computer engineers. Open-source hardware is a promising avenue to address this challenge by enabling design reuse and collaboration. ESP is an open-source research platform for system-on-chip design that combines a scalable tile-based architecture and a flexible system-level design methodology. Conceived as a heterogeneous system integration platform, ESP is intrinsically suited to foster collaborative engineering across the open-source hardware community. |
14:45 CET | W06.1.5 | NEAR-MEMORY HARDWARE ACCELERATION OF SPARSE WORKLOADS Speaker: Zhiru Zhang, Cornell University, US Abstract Sparse linear algebra operations are widely used in numerous application domains such as graph processing, machine learning, and scientific computing. These operations are typically more challenging to accelerate due to low operational intensity and irregular data access patterns. This talk presents our recent investigation into near-memory hardware acceleration for sparse processing. Specifically, I will discuss the importance of co-designing the sparse storage format and accelerator architecture to maximize the bandwidth utilization and compute occupancy. As a case study, I will introduce GraphLily, a graph linear algebra overlay for accelerating graph processing on HBM-equipped FPGAs. GraphLily supports a rich set of graph algorithms by adopting the GraphBLAS programming interface, which formulates graph algorithms as sparse linear algebra operations. |
W06 Break 1 Coffee break
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 15:15 CET - 15:30 CET
W06.2 Software development, libraries and languages
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 15:30 CET - 17:30 CET
Session chair:
Dionysios Diamantopoulos, Zurich Research Laboratory, Zurich, Switzerland, CH
This session turns to software support in form of programming models, runtime adaptability, modern libraries and programming abstractions for heterogeneous HPC and big data systems.
Time | Label | Presentation Title Authors |
---|---|---|
15:30 CET | W06.2.1 | METHODS AND TOOLS FOR ACCELERATING IMAGE PROCESSING APPLICATIONS ON FPGA-BASED SYSTEMS Speaker: Diana Göhringer, TU Dresden, DE Abstract Field Programmable Gate Arrays (FPGAs) are a promising platform for accelerating image processing as well as machine learning applications due to their parallel architecture, reconfigurability and energy-efficiency. However, programming such platforms can be quite cumbersome and time consuming compared to CPUs or GPUs. This presentation shows methods and tools for reducing the programming effort for image processing applications on FPGA-based systems. Our design methodology is based on the Open-VX standard and includes an open-source High Level Synthesis (HLS) library for generating image processing and neural network accelerators called HiFlipVX. The importance of such an approach is shown with application examples from different research projects. |
16:00 CET | W06.2.2 | GRIDTOOLS: HIGH-LEVEL HPC LIBRARIES FOR WEATHER AND CLIMATE Speaker: Hannes Vogt, ETH Zurich / CSCS, CH Abstract GridTools is a set of C++ libraries and Python tools to enable weather and climate scientists to express their computations in a high-level hardware-agnostic way, while providing highly efficient execution of the codes. |
16:30 CET | W06.2.3 | DOMAIN-SPECIFIC MULTI-LEVEL IR REWRITING FOR GPU: THE OPEN EARTH COMPILER FOR GPU-ACCELERATED CLIMATE SIMULATION Speaker: Tobias Grosser, University of Edinburgh, GB Abstract Most compilers have a single core intermediate representation (IR) (e.g., LLVM) sometimes complemented with vaguely defined IR-like data structures. This IR is commonly low-level and close to machine instructions. As a result, optimizations relying on domain-specific information are either not possible or require complex analysis to recover the missing information. In contrast, multi-level rewriting instantiates a hierarchy of dialects (IRs), lowers programs level-by-level, and performs code transformations at the most suitable level. We demonstrate the effectiveness of this approach for the weather and climate domain. In particular, we develop a prototype compiler and design stencil- and GPU-specific dialects based on a set of newly introduced design principles. We find that two domain-specific optimizations (500 lines of code) realized on top of LLVM’s extensible MLIR compiler infrastructure suffice to outperform state-of-the-art solutions. In essence, multi-level rewriting promises to herald the age of specialized compilers composed from domain- and target-specific dialects implemented on top of a shared infrastructure. |
17:00 CET | W06.2.4 | CLIMBING EVEREST: DESIGN ENVIRONMENT FOR EXTREME-SCALE BIG DATA ANALYTICS ON HETEROGENEOUS PLATFORMS Speaker: Gianluca Palermo, Politecnico di Milano - DEIB, IT Abstract This talk introduces the consortium-wide effort doing within the EVEREST H2020 project. The EVEREST project aims at developing a holistic design environment that simplifies the programmability of High-Performance Big Data analytics for heterogeneous, distributed, scalable, and secure systems. Our effort is concentrated on the use of a “data-driven” design approach together with domain-specific language extensions, hardware-accelerated AI, and efficient run-time monitoring while considering a unified hardware/software paradigm. The project targets a wide range of applications from weather analysis-based production for the renewable energy market trading, to air-quality monitoring of industrial sites, and real-time traffic modeling for transportation in smart cities. |
W06 Break 2 Coffee break
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 17:30 CET - 17:40 CET
W06.3 Discussion and closing
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 17:40 CET - 18:00 CET
W06.1 Big data, HPC and FPGAs
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 13:00 CET - 15:15 CET
Session chair:
Fabrizio Ferrandi, Politecnico di Milano, IT
This session gives an overview of the challenges of modern big-data applications, reports on state-of-the art FPGA-based HPC systems, describes an open-source reconfigurable hardware ecosystem and design flow, and closes with recent research on hardware acceleration for sparse big-data workloads.
Time | Label | Presentation Title Authors |
---|---|---|
13:00 CET | W06.1.1 | WORKSHOP INTRODUCTION Organisers: Jeronimo Castrillon1, Christoph Hagleitner2 and Christian Pilato3 1TU Dresden, DE; 2IBM Research -- Zurich Research Laboratory, CH; 3Politecnico di Milano, IT Abstract Brief introduction by the organizers on the motivation for as well as the format and the contents of the DATA-DREAM workshop in DATE 2022. |
13:15 CET | W06.1.2 | EVOLUTION OF THE DATA MARKET: HIGHLIGHTS AND PROJECTIONS Speaker: Nuria de Lama, Consulting Director, EU Government Consulting IDC, ES Abstract Taking decisions in the context of the data economy requires a deep understanding of the market and its projections. Investment in technologies should consider the value of indicators associated to the potential growth of the market, competitors, size of the ecosystem or a view on the skills gap. This presentation will offer updated figures elaborated by the Data Market Study run by IDC for 2021-2023 and will position the figures in a set of potential scenarios that will define the performance of the EU in a data-driven economy. Attendees will learn about the value of indicators such as data professionals and the skills gap, data companies, data suppliers, data economy, the value of the data market or the international dimension bringing some knowledge on markets outside the EU (US, Brazil, Japan, China). |
13:45 CET | W06.1.3 | SYSTEM AND APPLICATIONS OF FPGA CLUSTER "ESSPER" FOR RESEARCH ON RECONFIGURABLE HPC Speaker: Kentaro Sano, Center for Computational Science, RIKEN, JP Abstract At RIKEN Center for Computational Science (R-CCS), we have been developing an experimental FPGA Cluster named "ESSPER (Elastic and Scalable System for high-PErformance Reconfigurable computing)," which is a research platform for reconfigurable HPC. ESSPER is composed of sixteen Intel Stratix 10 SX FPGAs which are connected to each other by a dedicated 100Gbps inter-FPGA network. We have developed our own Shell (SoC) and its software APIs for the FPGAs supporting inter-FPGA communication. The FPGA host servers are connected to a 100Gbps Infiniband switch, which allows distant servers to remotely access the FPGAs by using a software bridged Intel's OPAE FPGA driver, called R-OPAE. By 100Gbps Infiniband network and R-OPAE, ESSPER is actually connected to the world's fastest supercomputer, Fugaku, deployed in RIKEN, so that using Fugaku we can program bitstreams onto FPGAs remotely using R-OPAE, and off-load tasks to the FPGAs. In this talk, I introduce our ESSPER's concept, system stack of hardware and software, programming environment, under-development applications as well as our future prospects for reconfigurable HPC. |
14:15 CET | W06.1.4 | OPEN-SOURCE HARDWARE FOR HETEROGENEOUS COMPUTING Speaker: Luca Carloni, Columbia University , US Abstract Information technology has entered the age of heterogeneous computing. Across a variety of application domains, computer systems rely on highly heterogeneous architectures that combine multiple general-purpose processors with many specialized hardware accelerators. The complexity of these systems, however, threatens to widen the gap between the capabilities provided by semiconductor technologies and the productivity of computer engineers. Open-source hardware is a promising avenue to address this challenge by enabling design reuse and collaboration. ESP is an open-source research platform for system-on-chip design that combines a scalable tile-based architecture and a flexible system-level design methodology. Conceived as a heterogeneous system integration platform, ESP is intrinsically suited to foster collaborative engineering across the open-source hardware community. |
14:45 CET | W06.1.5 | NEAR-MEMORY HARDWARE ACCELERATION OF SPARSE WORKLOADS Speaker: Zhiru Zhang, Cornell University, US Abstract Sparse linear algebra operations are widely used in numerous application domains such as graph processing, machine learning, and scientific computing. These operations are typically more challenging to accelerate due to low operational intensity and irregular data access patterns. This talk presents our recent investigation into near-memory hardware acceleration for sparse processing. Specifically, I will discuss the importance of co-designing the sparse storage format and accelerator architecture to maximize the bandwidth utilization and compute occupancy. As a case study, I will introduce GraphLily, a graph linear algebra overlay for accelerating graph processing on HBM-equipped FPGAs. GraphLily supports a rich set of graph algorithms by adopting the GraphBLAS programming interface, which formulates graph algorithms as sparse linear algebra operations. |
W02.3 Session 3: Ultra High Density of 3D, Monolithic 3D
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 13:30 CET - 15:15 CET
Session chair:
Saibal Mukhopadhyay, GeorgiaTech, US
Time | Label | Presentation Title Authors |
---|---|---|
13:30 CET | W02.3.1 | THIN-FILM BASED MONOLITHIC 3D SYSTEMS Speaker: Umesh Chand, NTU, SG |
13:50 CET | W02.3.2 | TEMPERATURE-AWARE MONOLITHIC 3D DNN ACCELERATORS FOR BIOMEDICAL APPLICATIONS Speaker: Prachi Shukla, Boston University, US |
14:10 CET | W02.3.3 | A COMPUTE-IN-MEMORY HARDWARE ACCELERATOR DESIGN WITH BEOL TRANSISTOR BASED RECONFIGURABLE INTERCONNECT Speaker: Shimeng Yu, GeorgiaTech, US |
14:30 CET | W02.3.4 | NANOSYSTEMS FOR ENERGY-EFFICIENT COMPUTING USING CARBON NANOTUBE FETS AND MONOLITHIC 3D INTEGRATION Speaker: Tathagata Srimani, Stanford University, US |
W03.4 Demonstrators
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 13:30 CET - 14:00 CET
Time | Label | Presentation Title Authors |
---|---|---|
13:30 CET | W03.4.1 | NEURONN LIVE DEMONSTRATORS Presenter: Madeleine Abernot, CNRS, FR |
13:30 CET | W03.4.2 | NEURONN LIVE DEMONSTRATORS Presenter: Theophile Gonos, A.I.Mergence, FR |
13:30 CET | W03.4.3 NEURONN LIVE DEMONSTRATORS Speaker: Thierry Gil, CNRS, FR |
W04.3 Keynote - RISC-V Open Era of Computing: Innovation, adoption, and opportunity in Europe and beyond
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 13:30 CET - 14:00 CET
Keynote Speaker:
Calista Redmond, RISC-V International, US
RISC-V is the undisputed lead architecture that has ushered in a profound new open era in compute. The innovations and implementations of RISC-V span from embedded to enterprise, from IoT to HPC. RISC-V is delivering on extensions, tools, and investments of a global community ranging from start-ups to multi-nationals, from students to research fellows. This talk will highlight that progress and opportunity, with an invitation to engage
W09.4 Keynote 3
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 13:30 CET - 14:30 CET
Session chair:
Elif Bilge Kavun, University of Passau, DE
Time | Label | Presentation Title Authors |
---|---|---|
13:30 CET | W09.4.1 | PROGRAMMABILITY AND SUSTAINABILITY IN THE FUTURE INTERNET Keynote Speaker: Paola Grosso, University of Amsterdam, NL |
W10.S4 Panel on Autonomous Systems Design
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 13:30 CET - 15:00 CET
Session Chair:
David Harel, Weizmann Institute of Science, IL
Panellists:
Sandeep Neema, Darpa, US
Alberto Sangiovanni, UC Berkeley, US
Carlo Ghezzi, Politecnico di Milano, IT
Simon Burton, Fraunhofer, DE
Michael Paulitsch, Intel, DE
Arne Haman, Bosch Research, DE
Organizers:
- David Harel, Weizmann Institute of Science
- Joseph Sifakis, Verimag Laboratory
Moderator:
- David Harel, Weizmann Institute of Science
Panelists:
- Sandeep Neema, Darpa
- Alberto Sangiovanni, Berkeley
- Carlo Ghezzi, Politecnico di Milano
- Simon Burton, Fraunhofer
- Michael Paulitsch, Intel
- Arne Haman, Bosch
Topics to be discussed by the panel:
1) What is your vision for AS? For example, the role of these systems in the IoT and AI revolutions; autonomy as a step from weak to strong AI; the gap between automated and autonomous systems.
2) What challenges do you see in AS design? For example, AI-enabled end-to-end solutions; "hybrid design" approaches, integrating model- and data-driven components; systems engineering issues.
3) How should we ensure the reliability of AS? For example, achieving explainable AI; adapting and extending rigorous V&V techniques to ASs; ensuring safety based exclusively on simulation and testing.
4) Looking to the future, is the vision of total autonomy viable? how can we make it happen? For example, decisive factors for acceptance; research challenges; ethical issues; "easy" total autonomy categories.
W03.5 Neuromorphic Architecture & Design
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 14:00 CET - 15:00 CET
Time | Label | Presentation Title Authors |
---|---|---|
14:00 CET | W03.5.1 | EFFECT OF DEVICE MISMATCHES IN DIFFERENTIAL OSCILLATORY NEURAL NETWORKS Speaker: Jafar Shamsi , University of Calgary , CA |
14:30 CET | W03.5.2 | MACHINE LEARNING FOR THE DESIGN OF WAVE AND OSCILLATOR-BASED COMPUTING DEVICES Speaker: Gyorgy Csaba, Pázmány University Budapest, HU |
W04.4 Panel on Licensing, Funding, Cooperation & Regulation
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 14:00 CET - 15:30 CET
Session chair:
Andrew Katz, OpenForum Europe, GB
There are several non-technical challenges for Open Source Hardware that should be addressed on the regulatory and policy level. There is a need for researching funding mechanisms that could increase Europe's influence and the level of participation of and cooperation between SMEs, academia and large companies, as well as a clarification of the regulatory and licensing issues that might stifle innovation. Are there policy solutions that can support such developments and Open Source Hardware initiatives? Who can be a driver of change? In this panel the panellists will discuss these pertinent issues and share their experiences.
Time | Label | Presentation Title Authors |
---|---|---|
14:00 CET | W04.4.1 | INTRODUCTION OF THE PANEL'S TOPICS AND PANELISTS Presenter: Andrew Katz, OpenForum Europe, GB |
14:05 CET | W04.4.2 | AFTER EATING SOFTWARE, OPEN IS “EATING” EVERYTHING! Panellist: Mike Milinkovich, Eclipse Foundation, BE |
14:10 CET | W04.4.3 | EUROPRACTICE AS BREEDING GROUND FOR EUROPEAN OPEN SOURCE HARDWARE INITIATIVES Presenter: Romano Hoofman, Europractice, BE |
14:15 CET | W04.4.4 | FUNDING OPEN SOURCE HARDWARE: GETTING THE BEST FROM PUBLICLY FUNDED RESEARCH THROUGH COMMERCIAL PARTNERSHIPS Presenter: Javier Serrano, CERN, CH |
14:20 CET | W04.4.5 | REINFORCING LARGE-SCALE DESIGN CAPACITIES: A PARTIAL VIEW FROM A FUNDING AGENCY Presenter: Arian Zwegers, European Commission, BE |
14:25 CET | W04.4.6 | BUILDING EUROPEAN RISC-V LEADERSHIP IN GLOBAL OPEN SOURCE HARDWARE Presenter: Calista Redmond, RISC-V International, US |
14:30 CET | W04.4.7 | WHAT ARE LEGAL CHALLENGES FOR WIDESPREAD USE OF OPEN SOURCE HW? WHAT ARE THE LICENSING ISSUES? Presenter: Andrew Katz, OpenForum Europe, GB |
14:35 CET | W04.4.8 | PANEL DISCUSSION ON LICENSING, FUNDING, COOPERATION & REGULATION. Panellists: Romano Hoofman1, Javier Serrano2, Arian Zwegers3, Calista Redmond4 and Mike Milinkovich5 1Europractice, BE; 2CERN, CH; 3European Commission, BE; 4RISC-V International, US; 5Eclipse Foundation, BE Moderator: Andrew Katz, OpenForum Europe, GB Abstract Q&A session with the audience and panel discussion on on Licensing, Funding, Cooperation & Regulation |
W01.T3 Technical Session 3 - Reliability and Safety
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 14:30 CET - 15:30 CET
Session Chair:
Michelangelo Grosso, STMicroelectronics, IT
Time | Label | Presentation Title Authors |
---|---|---|
14:30 CET | W01.T3.1 | IMPROVING INSTRUCTION CACHE MEMORY RELIABILITY UNDER REAL-TIME CONSTRAINTS Speaker: Fabien Bouquillon , Université de Lille, FR |
14:45 CET | W01.T3.2 | COMMON DATA LANGUAGE CONNECTING HTOL TESTING TO IN-FIELD USE Speaker: Marc Hutner , proteanTecs, CA |
15:00 CET | W01.T3.3 | EFFICIENT USE OF ON-LINE LOGICBIST TO ACHIEVE ASIL B IN A GPU IP Speaker: Lee Harrison, Siemens EDA, GB |
15:15 CET | W01.T3.4 | VERIFICATION AND VALIDATION OF SAFETY ELEMENT OUT OF CONTEXT Speaker: Shivakumar Chonnad , Synopsys Inc, US |
W05.7 Closing Notes
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 14:30 CET - 14:40 CET
Speakers:
Christian Weis, University of Kaiserslautern, DE
Leibin Ni, Huawei Technologies Co., Ltd., CN
W09.5 Session 2: Lightweight Security & Safety for Emerging Technologies
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 14:45 CET - 15:45 CET
Session chair:
Ilia Polian, University of Stuttgart, DE
Time | Label | Presentation Title Authors |
---|---|---|
14:45 CET | W09.5.1 | FUNCTIONAL SAFETY OF DEEP NEURAL NETWORK ACCELERATOR Speaker: Kanad Basu, University of Texas at Dallas, US |
15:05 CET | W09.5.2 | SUSTAINABLE LATTICE BASED CRYPTOGRAPHY USING OPENCL NUMBER-THEORETIC TRANSFORM Speaker: Apostolos Fournaris, ISI, GR |
15:25 CET | W09.5.3 | REMOTE ATTESTATION OF IOT DEVICES: PAST, PRESENT AND FUTURE Speaker: Md Masoom Rabbani, ES&S, imec-COSIC, ESAT, BE |
W02.CB Coffee Break
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 14:50 CET - 15:15 CET
W03.CB Coffee Break
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 15:00 CET - 15:30 CET
W10.B3 Break
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 15:00 CET - 15:10 CET
W10.S5 V2X, Edge Computing and Connected Applications
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 15:10 CET - 16:50 CET
Session chair:
Dirk Ziegenbein, Robert Bosch GmbH, DE
Speaker & Panellists:
Frank Hofmann, Bosch, DE
Chun-Ting Chou, OmniEyes, TW
Ziran Wang, Toyota Motor North America R&D, US
Stefano Marzani, Amazon, US
Connectivity realizes many advanced applications for vehicles. Especially, the interactions between vehicles and edge servers (or roadside units) further boost the trend and involve more players in the business. In this session, the experts from an auto-maker, a supplier, a high-tech company, and a start-up will meet together, describe their roles in the connected and edge-computing environment, and discuss potential integration or competition.
Agenda of Talks and Speakers/Panelists:
- 15:10 - 15:25: "Video Uberization Using Edge AI and Mobile Video", Chun-Ting Chou, OmniEyes, TW
- 15:25 - 15:40: "Connected Applications as Driver for Automation", Frank Hofmann, Bosch, DE
- 15:40 - 15:55: "Environmental parity between cloud and embedded edge as a foundation for software-defined vehicles", Stefano Marzani, Amazon, US
- 15:55 - 16:10: "Mobility Digital Twin with Connected Vehicles and Cloud Computing", Ziran Wang, Toyota Motor North America R&D, US
- 16:10 - 16:50: Interactive panel discussion on V2X, Edge Computing, and Connected Applications
W02.K KEYNOTE: 3D stacking architectures: opportunities for Augmented Reality applications
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 15:15 CET - 15:45 CET
Session chair:
Subhasish Mitra, Stanford University, US
Speaker:
Edith Beigné, META, US
W06 Break 1 Coffee break
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 15:15 CET - 15:30 CET
W01.ET Embedded Tutorial - IEEE P2851 advancements
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 15:30 CET - 16:10 CET
Session Chair:
Oscar Ballan , Ethernovia, US
Organiser:
Jyotika Athavale , NVIDIA, US
Speakers:
Bernhard Bauer, Synopsys, DK
Meirav Nitzan , Synopsys, US
W03.6 Neuromorphic Computing
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 15:30 CET - 17:00 CET
Time | Label | Presentation Title Authors |
---|---|---|
15:30 CET | W03.6.1 | FULLY SPINTRONIC RADIOFREQUENCY NEURAL NETWORKS Speaker: Alice Mizrahi, CNRS/Thales, FR |
16:00 CET | W03.6.2 | ANALOG OSCILLATORY NEURAL NETWORKS FOR ENERGY-EFFICIENT COMPUTING AT THE EDGE Speaker: Corentin Delacour, CNRS, FR |
16:30 CET | W03.6.3 | RELIABLE PROCESSING-IN-MEMORY BASED MANYCORE ARCHITECTURES FOR DEEP LEARNING: FROM CNNS TO GNNS Speaker: Partha Pratim Pande, Washington State University, US |
W06.2 Software development, libraries and languages
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 15:30 CET - 17:30 CET
Session chair:
Dionysios Diamantopoulos, Zurich Research Laboratory, Zurich, Switzerland, CH
This session turns to software support in form of programming models, runtime adaptability, modern libraries and programming abstractions for heterogeneous HPC and big data systems.
Time | Label | Presentation Title Authors |
---|---|---|
15:30 CET | W06.2.1 | METHODS AND TOOLS FOR ACCELERATING IMAGE PROCESSING APPLICATIONS ON FPGA-BASED SYSTEMS Speaker: Diana Göhringer, TU Dresden, DE Abstract Field Programmable Gate Arrays (FPGAs) are a promising platform for accelerating image processing as well as machine learning applications due to their parallel architecture, reconfigurability and energy-efficiency. However, programming such platforms can be quite cumbersome and time consuming compared to CPUs or GPUs. This presentation shows methods and tools for reducing the programming effort for image processing applications on FPGA-based systems. Our design methodology is based on the Open-VX standard and includes an open-source High Level Synthesis (HLS) library for generating image processing and neural network accelerators called HiFlipVX. The importance of such an approach is shown with application examples from different research projects. |
16:00 CET | W06.2.2 | GRIDTOOLS: HIGH-LEVEL HPC LIBRARIES FOR WEATHER AND CLIMATE Speaker: Hannes Vogt, ETH Zurich / CSCS, CH Abstract GridTools is a set of C++ libraries and Python tools to enable weather and climate scientists to express their computations in a high-level hardware-agnostic way, while providing highly efficient execution of the codes. |
16:30 CET | W06.2.3 | DOMAIN-SPECIFIC MULTI-LEVEL IR REWRITING FOR GPU: THE OPEN EARTH COMPILER FOR GPU-ACCELERATED CLIMATE SIMULATION Speaker: Tobias Grosser, University of Edinburgh, GB Abstract Most compilers have a single core intermediate representation (IR) (e.g., LLVM) sometimes complemented with vaguely defined IR-like data structures. This IR is commonly low-level and close to machine instructions. As a result, optimizations relying on domain-specific information are either not possible or require complex analysis to recover the missing information. In contrast, multi-level rewriting instantiates a hierarchy of dialects (IRs), lowers programs level-by-level, and performs code transformations at the most suitable level. We demonstrate the effectiveness of this approach for the weather and climate domain. In particular, we develop a prototype compiler and design stencil- and GPU-specific dialects based on a set of newly introduced design principles. We find that two domain-specific optimizations (500 lines of code) realized on top of LLVM’s extensible MLIR compiler infrastructure suffice to outperform state-of-the-art solutions. In essence, multi-level rewriting promises to herald the age of specialized compilers composed from domain- and target-specific dialects implemented on top of a shared infrastructure. |
17:00 CET | W06.2.4 | CLIMBING EVEREST: DESIGN ENVIRONMENT FOR EXTREME-SCALE BIG DATA ANALYTICS ON HETEROGENEOUS PLATFORMS Speaker: Gianluca Palermo, Politecnico di Milano - DEIB, IT Abstract This talk introduces the consortium-wide effort doing within the EVEREST H2020 project. The EVEREST project aims at developing a holistic design environment that simplifies the programmability of High-Performance Big Data analytics for heterogeneous, distributed, scalable, and secure systems. Our effort is concentrated on the use of a “data-driven” design approach together with domain-specific language extensions, hardware-accelerated AI, and efficient run-time monitoring while considering a unified hardware/software paradigm. The project targets a wide range of applications from weather analysis-based production for the renewable energy market trading, to air-quality monitoring of industrial sites, and real-time traffic modeling for transportation in smart cities. |
W02.4 Session 4: 3D Design, Methodology and Thermal
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 15:45 CET - 17:05 CET
Session chair:
Peter Ramm, Fraunhofer, DE
Time | Label | Presentation Title Authors |
---|---|---|
15:45 CET | W02.4.3 | EDA TOOLS AND PPA TRADEOFF STUDIES FOR MICRO-BUMP AND HYBRID BOND 3D ICS Speaker: SungKyu Lim, GeorgiaTech, US |
16:05 CET | W02.4.2 | HETEROGENEOUS PACKAGING DESIGN AND VERIFICATION WORKFLOWS Speaker: Anthony Mastroianni, SIEMENS EDA, US |
16:25 CET | W02.4.4 | TOWARDS A PLACE AND ROUTE FLOW FOR HIGH DENSITY 3D-ICS Speaker: Mauricio Altieri, CEA, FR |
16:45 CET | W02.4.1 | CHALLENGES AND OPPORTUNITIES FOR THERMALS IN HETEROGENEOUS 3D PACKAGING Speaker: Rajiv Mongia, INTEL, US |
W04.5 Panel on Industrial Concerns
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 15:45 CET - 17:15 CET
Session chair:
Frank Gürkaynak , ETH Zürich, CH
Open source hardwarre, especially around the RISC-V architecture has been a talking point in the recent years. There has been a rapid development and from humble beginnings where open source hardware was a niche between enthusiasts and academics, today we have multi-billion dollar companies and recently a commitment from the European Commission to support work in open source hardware. In this session, we would like to go beyond the buzz and discuss with people involved in industry what opportunities they see, what the potential roadblocks are and what they think is still missing.
Time | Label | Presentation Title Authors |
---|---|---|
15:45 CET | W04.5.1 | INTRODUCTION OF THE PANEL'S TOPICS AND PANELISTS Presenter: Frank Gürkaynak , ETH Zürich, CH |
15:50 CET | W04.5.2 | OPENHW CORE-V: RISC-V OPEN-SOURCE CORES FOR HIGH VOLUME PRODUCTION SOCS Presenter: Rick O'Connor, OpenHW Group, CA |
15:55 CET | W04.5.3 | LEVERAGING OPEN SOURCE HARDWARE IN COMMERCIAL PRODUCTS: BENEFITS AND CHALLENGES Presenter: LoÏc Lietar, GreenWaves, FR |
16:00 CET | W04.5.4 | CAN OPEN SOURCE HW ADDRESS INDUSTRIAL CONCERNS FOR CYBERSECURITY AND TRUSTED ELECTRONICS? Presenter: Matthias Hiller, Fraunhofer, ECSO, DE |
16:05 CET | W04.5.5 | INDUSTRIAL REQUIREMENTS FOR OPEN SOURCE HARDWARE Presenter: Jean-Christian Kircher, Bosch, FR |
16:10 CET | W04.5.6 | THALES' PERSPECTIVES ON OPEN SOURCE HARDWARE Presenter: Thierry Collette, Thales, FR |
16:15 CET | W04.5.7 | INDUSTRIAL CONCERNS ABOUT OPEN HARDWARE Presenter: Zdeněk Přikryl, Codasip, CZ |
16:20 CET | W04.5.8 | PANEL DISCUSSION ON INDUSTRIAL CONCERNS Panellists: Rick O'Connor1, LoÏc Lietar2, Matthias Hiller3, Jean-Christian Kircher4, Thierry Collette5 and Zdeněk Přikryl6 1OpenHW Group, CA; 2GreenWaves, FR; 3Fraunhofer, ECSO, DE; 4Bosch, FR; 5Thales, FR; 6Codasip, CZ Moderator: Frank Gürkaynak , ETH Zürich, CH Abstract Q&A session with the audience and panel discussion on industrial concerns of open source hardware. |
W09.6 Panel: Is Security An Enabler, An Enemy or Simply A Nightmare for Sustainability?
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 16:00 CET - 17:00 CET
Session Chairs:
Francesco Regazzoni, University of Amsterdam, NL and Università della Svizzera italiana, CH
Elif Bilge Kavun, University of Passau, DE
Panellists:
David Bol, UC Louvain, BE
Yuri Demchenko, University of Amsterdam, NL
Marc Stoettinger, Rhein Main University of Applied Sciences, DE
Ruggero Susella, ST Microelectronics, IT
W01.3 Invited Session 2: The challenges of reaching zero defect and functional safety – and how the EDA industry tackles them
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 16:30 CET - 17:30 CET
Session Chair:
Daniel Tille, Infineon, DE
Organisers:
Riccardo Cantoro, Politecnico di Torino, IT
Daniel Tille, Infineon, DE
Abstract: Automotive Microcontrollers have been becoming very complex System-on-Chips (SoCs). Especially the megatrends Assisted Driving (ADAS) and Automated Driving (AD), but also traditional applications such as power-train steering require ever-increasing functionality. However, these safety-critical environments require zero defect, and the implementation of functional safety measures and the rising complexity poses significant challenges to satisfy these requirements.This special session addresses these challenges and shows potential solutions to overcome them with the help of the EDA industry.
Presentations:
- Automated solutions for safety and security vulnerabilities
Teo Cupaiuolo (Synopsys)
- Functional Safety: an EDA perspective
Alessandra Nardi (Cadence)
- The Zero Defect Goal For Automotive ICs
Lee Harrison (Siemens EDA); Nilanjan Mukherjee (Siemens)
Time | Label | Presentation Title Authors |
---|---|---|
16:30 CET | W01.3.1 | AUTOMATED SOLUTIONS FOR SAFETY AND SECURITY VULNERABILITIES Speaker: Teo Cupaiuolo, Synopsis, IT |
16:50 CET | W01.3.2 | FUNCTIONAL SAFETY: AN EDA PERSPECTIVE Speaker: Alessandra Nardi, Cadence, US |
17:10 CET | W01.3.3 | THE ZERO DEFECT GOAL FOR AUTOMOTIVE ICS Speaker: Lee Harrison, Siemens EDA, GB |
W10.S6 Closing
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 16:50 CET - 17:00 CET
W02.C Workshop Closing Remarks
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 17:05 CET - 17:15 CET
W01.4 Panel: What are the limitations of EDA tools with respect to zero defects and FuSa?
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 17:30 CET - 18:30 CET
Session Chair:
Wim Dobbelaere, onsemi, BE
Organiser:
Davide Appello, STMicroelectronics, IT
Panellists:
Antonio Priore, ARM, GB
Georges Gielen, KU Leuven, BE
Chen He, NXP, US
Mauro Pipponzi, ELES, IT
Vladimir Zivkovic, Infineon, DK
Om Ranjan , STMicroelectronics, IN
Abstract: High quality demanding products segments like automotive, transportation, and aerospace have been characterized by persistent needs across several years:
- Zero defects, or in general very low defective levels
- Accurate modeling and prediction of product reliability
The sustainability of these objectives is challenged by the relentless demand of higher performances products and the consequent access to higher complexities and advanced technology nodes.
Functional safety standards and requirements aim to grant the usability of products in safety-critical applications and add several requirements whose satisfaction is a key criticality during the development of a new product.
The proposed panel session would like to debate with the experts about how much the available EDA tools are effectively helping to face the described challenges.
As an example, these are suitable questions that anyone in the field may need answering:
- How does EDA help effectively resolve “end-to-end” the traceability of defined requirements? Is this representing a sustainable effort?
- Is DFT effective enough in addressing fault models to reach target quality?
- Is verification/simulation/validation effective respect transient mode?
W06 Break 2 Coffee break
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 17:30 CET - 17:40 CET
W06.3 Discussion and closing
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 17:40 CET - 18:00 CET
15.2 Panel: Forum on Advancing Diversity in EDA (DivEDA)
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 18:30 CET - 20:30 CET
Session chair:
Ayse K Coskun, Boston University, US
Session co-chair:
Nele Mentens, KU Leuven, BE
Panellists:
Ileana Buhan, Radboud University Nijmegen, NL
Michaela Blott, Xilinx, IE
Andreia Cathelin, STMicroelectronics, FR
Marian Verhelst, KU Leuven, BE
The 3rd Advancing Diversity in EDA (DivEDA) forum is co-sponsored by IEEE CEDA and ACM SIGDA. The goal of DivEDA is to help facilitate women and underrepresented minorities (URM) to advance their careers in academia and industry, and hence, to help increase diversity in the EDA community. A more diverse community will then help accelerate innovation in the EDA ecosystem and benefit societal progress. Through an interactive medium, our aim is to provide practical tips to women and URM on how to succeed and to overcome possible hurdles in their career growth, while at the same time, connecting senior and junior researchers to enable a growing diverse community. We are excited to build upon earlier diversity-focused efforts in EDA and create a venue that aims to make a difference. Prior DivEDA editions were held at DATE’18 and DAC’19. This year’s forum will be held as a single 2-hours virtual session, including a 1-hour panel followed by smaller group mentoring and Q&A sessions. The topic of the forum is “Addressing career challenges during the pandemic: work-life balance, networking, and more”. Registration to the event is free of charge.
W01.5 Closing
Add this session to my calendar
Date: Friday, 18 March 2022
Time: 18:30 CET - 18:45 CET
Chairs:
Paolo Bernardi, Politecnico di Torino, IT
Riccardo Cantoro, Politecnico di Torino, IT
Yervant Zorian, Synopsys, US
Wim Dobbelaere, onsemi, BE
M02 Tutorial: Computing with High-dimensional Vectors for Energy-Efficient AI: Hyperdimensional Computing aka Vector Symbolic Architectures
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 09:00 CET - 13:00 CET
Organisers:
Abbas Rahimi, IBM Research Zurich, CH
Denis Kleyko, University of California, Berkeley, US and Research Institutes of Sweden, Kista, SE
Evgeny Osipov, Luleå University of Technology, SE
Jan M. Rabaey, University of California, Berkeley, US
Abstract
This tutorial will introduce an emerging computing framework that is based on using high-dimensional distributed vectors to represent and manipulate symbols, data structures, and functions. This framework, commonly known as both Hyperdimensional Computing or Vector Symbolic Architectures, originated at the intersection of symbolic and connectionist approaches to Artificial Intelligence but has turned into a research area of its own. Hyperdimensional Computing/Vector Symbolic Architectures is a highly interdisciplinary area with connections to neuroscience, computer science, electrical engineering, mathematics, and cognitive science. This fact makes it challenging to form a thorough picture of the area. At the same time, we believe that it is extremely important to facilitate the entrance of new members joining the area. Therefore, the purpose of this tutorial is to convey the framework and recent developments to interested researchers.
The tutorial will cover such aspects of the area as: known computational models, transformations of various input data to high-dimensional distributed representations, applications with a focus on machine learning, and efficient hardware implementations using emerging technologies.
Motivation
There is a global trend of searching for computing paradigms alternative to the conventional one such as emerging fields of neuromorphic and nanoscalable hardware. Moreover, there is a strong demand for low-cost data-driven approaches, which are referred to by the terms “tiny machine learning” and “edge machine learning”. We foresee that Hyperdimensional Computing/Vector Symbolic Architectures are going to play an important role for providing tiny machine learning on unconventional hardware. The tutorial is timely as we see a sharp peak of interest in the topic from the electrical engineering community.
Goal
We see that the major problem for the researchers new to the area is that the works on Hyperdimensional Computing/Vector Symbolic Architectures are spread across many disciplines and cannot be tracked easily. Thus, understanding the state-of-the-art of the area is not trivial due to widespread of venues. Therefore, in this tutorial we aim at covering the main topics within Hyperdimensional Computing/Vector Symbolic Architectures as well as providing the broad coverage of the area, which is often missing. The tutorial will have the elements of hands on but many of its parts will be focused on fundamentals and presenting the state-of-the-art.
Necessary background
The tutorial is intended for participants with basic knowledge of linear algebra, probability theory, elementary logic, as well as basic programming skills.
Content
The tutorial will consist of 4 main parts and a panel discussion at the end.
Part 1 - Introduction to Computing with High-dimensional Vectors: Concepts, Models, Primitives for Data Structures, and Locality-Preserving Encoding
Speaker: Denis Kleyko
Duration: 50 minutes
This part will provide a gentle introduction into Hyperdimensional Computing/Vector Symbolic Architectures by first overviewing the main principles and basic operations. It will also touch on the ideas for transforming data from its original representation into a high-dimensional space. For example, we will discuss how to use the basic operations for representing a large variety of data structures and how to form locality-preserving representations that are essential when working with, e.g., ordinal or ratio data.
Part 2 - Computing with High-dimensional Vectors in Machine Learning
Speaker: Evgeny Osipov
Duration: 50 minutes
This part will focus on the usage of Hyperdimensional Computing/Vector Symbolic Architectures in the context of machine learning. We first will overview the use-cases where high-dimensional representations introduced in the previous tutorial are used as input to classical machine learning algorithms discussing pros and cons of such approaches. We then move on and describe how an entire algorithm of randomly connected artificial neural networks (Random Vector Functional Link Networks, Echo State Networks) could be implemented purely with Hyperdimensional Computing/Vector Symbolic Architectures operations and discuss implications of such design for power-efficient implementation of neural algorithms. We conclude by presenting a holistic approach for unsupervised learning using high-dimensional representations as an input and Hyperdimensional Computing/Vector Symbolic Architectures operations for the realization of the algorithm.
Part 3 - Computing with High-dimensional Vectors: From Efficient Classifiers to Efficient General AI
Speaker: Abbas Rahimi
Duration: 50 minutes
This part first focuses on how to build efficient Hyperdimensional Computing/Vector Symbolic Architectures-based classifiers, with main emphasis on few-shot learning and implementations on analog in-memory computing hardware. Next, it will provide insights on how to expand the application area of Hyperdimensional Computing/Vector Symbolic Architectures beyond narrow AI towards general AI.
Part 4 - Hardware for Computing with High-dimensional Vectors: State-of-the-Art, Challenges, and Perspectives
Speaker: Jan M. Rabaey
Duration: 50 minutes
Part 5 - Panel discussion
Speakers: Denis Kleyko, Evgeny Osipov, Abbas Rahimi, Jan M. Rabaey.
Duration: 30 minutes
The time allocated for the panel discussion will be used to elaborate on the aspects of Hyperdimensional Computing/Vector Symbolic Architectures, which might not be covered by the tutorial in great details but would be of interest to some of the audience.
Schedule (all times in CET)
- 09:00 - 09:50 Part 1 – Denis Kleyko. Introduction to Computing with High-dimensional Vectors: Concepts, Models, Primitives for Data Structures, and Locality-Preserving Encoding
- 09:50 - 10:40 Part 2 – Evgeny Osipov. Computing with High-dimensional Vectors in Machine Learning
- 10:40 - 10:50 Break
- 10:50 - 11:40 Part 3 – Abbas Rahimi. Computing with High-dimensional Vectors: From Efficient Classifiers to Efficient General AI
- 11:40 - 12:30 Part 4 – Jan M. Rabaey. Hardware for Computing with High-dimensional Vectors: State-of-the-Art, Challenges, and Perspectives
- 12:30 - 13:00 Panel discussion
Tutorial material
We plan to provide attendees with the following materials:
- A list of recommended reading;
- Jupyter notebook for basic data structures;
- Jupyter notebook for an example of a simple classification model;
- A comprehensive list of Github projects related to machine learning applications.
M02.1 Part 1: Introduction to Computing with High-dimensional Vectors: Concepts, Models, Primitives for Data Structures, and Locality-Preserving Encoding
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 09:00 CET - 09:50 CET
Presenter:
Denis Kleyko, University of California, Berkeley, US and Research Institutes of Sweden, Kista, SE
M02.2 Part 2: Computing with High-dimensional Vectors in Machine Learning
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 09:50 CET - 10:40 CET
Presenter:
Evgeny Osipov, Luleå University of Technology, SE
M02.3 Part 3: Computing with High-dimensional Vectors: From Efficient Classifiers to Efficient General AI
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 10:50 CET - 11:40 CET
Presenter:
Abbas Rahimi, IBM Research Zurich, CH
M02.4 Part 4: Hardware for Computing with High-dimensional Vectors: State-of-the-Art, Challenges, and Perspectives
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 11:40 CET - 12:30 CET
Presenter:
Jan M. Rabaey, University of California, Berkeley, US
M02.5 Part 5: Panel discussion
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 12:30 CET - 13:00 CET
Panellists:
Abbas Rahimi, IBM Research Zurich, CH
Denis Kleyko, University of California, Berkeley, US and Research Institutes of Sweden, Kista, SE
Evgeny Osipov, Luleå University of Technology, SE
Jan M. Rabaey, University of California, Berkeley, US
M02.1 Part 1: Introduction to Computing with High-dimensional Vectors: Concepts, Models, Primitives for Data Structures, and Locality-Preserving Encoding
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 09:00 CET - 09:50 CET
Presenter:
Denis Kleyko, University of California, Berkeley, US and Research Institutes of Sweden, Kista, SE
M03 Accelerating Inferencing on Embedded Systems
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 09:00 CET - 13:00 CET
Organiser:
Mathilde Karsenti, Siemens EDA, US
Speakers:
Petri Solanti, Siemens EDA, DE
Russell Klein, Siemens EDA, US
Speakers:
Russell Klein, Program Director, Siemens EDA
Petri Solanti, Engineer, Siemens EDA
Motivation:
Machine learning requires significant computational capabilities. Inferencing often need to be done on embedded systems, for response time or privacy concerns, where compute resources are often limited. Addressing the compute demands can be done thought parallel computation on multi-core systems, GPUs, or even on TPUs (machine learning accelerators). However, the optimal performance can be achieved through the development of bespoke accelerators, deployed in FPGA or ASIC hardware.
Goal:
The goal of this tutorial is to present and discuss how to achieve high-performance, yet low-power inferencing on computationally and power constrained edge embedded systems.
Technical Details:
This tutorial will describe how High Level Synthesis (HLS) can be used to implement and verify accelerators for machine learning algorithms. It will cover how to explore architectural alternatives and comparisons with more traditional acceleration approaches.
High Level Synthesis offers an alternative implementation method that allows developers to go from algorithm to hardware in a much shorter time and with significantly less human (and error prone) interpretation. As machine learning makes rapid advances, older algorithms are quickly discarded as more effective ones are developed. Having a fast path from algorithm to implementation ensures that systems will incorporate the latest advances available. High-Level Synthesis is a mature technology and is available from the major EDA and FPGA vendors and has proven to be a capable for algorithm acceleration.
This tutorial will take the inferencing portion of a machine learning object recognition algorithm and show how it can be developed, optimized, and verified through high level synthesis. It will start with the algorithm running on a desk-top environment in a high-level machine learning framework and will demonstrate the steps needed to move that algorithm into a hardware implementation suitable for deployment in an embedded system.
Schedule:
- 09:00 - 09:15: Opening
- 09:15 - 09.45: Introduction to Embedded Inferencing, Survey of Acceleration Techniques
- 09:45 - 10:00: Profiling Embedded System Execution
- 10:00 - 11:00: High-Level Synthesis of Inferencing Algorithms
- 11:00 - 11:15: Break/Coffee
- 11:15 - 11:45: Quantization of Inferencing Algorithm, Area and Power Savings
- 11:45 - 12:15: Power, Performance, and Area optimization of Inferencing Accelerator
- 12:30 - 12:50: Verification of Accelerator
- 12:50 - 13:00: Summary and Q & A
Attendees will receive:
- A copy of presentation materials
- Source code for an exemplary project
Necessary background:
- An understanding of hardware development methodologies, either ASIC or FPGA
- Some understanding of machine learning algorithms, specifically how inferencing is performed
- Some understanding of video processing algorithms would be helpful
M03.0 Opening/Introduction
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 09:00 CET - 09:15 CET
Speaker:
Mathilde Karsenti, Siemens EDA, US
M03.1 Introduction to Embedded Inferencing, Survey of Acceleration Techniques
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 09:15 CET - 09:45 CET
Speaker:
Russell Klein, Siemens EDA, US
M03.2 Profiling Embedded System Execution
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 09:45 CET - 10:00 CET
Speaker:
Russell Klein, Siemens EDA, US
M03.3 High-Level Synthesis of Inferencing Algorithms
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 10:00 CET - 11:00 CET
Speaker:
Petri Solanti, Siemens EDA, DE
M03.4 Quantization of Inferencing Algorithm
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 11:15 CET - 11:45 CET
Speaker:
Petri Solanti, Siemens EDA, DE
M03.5 Power, Performance, and Area optimization of Inferencing Accelerator
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 11:45 CET - 12:15 CET
Speaker:
Russell Klein, Siemens EDA, US
M03.6 Verification of Accelerator
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 12:30 CET - 12:50 CET
Speaker:
Petri Solanti, Siemens EDA, DE
M03.7 Question and Answer Period
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 12:50 CET - 13:00 CET
Panellists:
Russell Klein, Siemens EDA, US
Petri Solanti, Siemens EDA, DE
Mathilde Karsenti, Siemens EDA, US
M03.0 Opening/Introduction
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 09:00 CET - 09:15 CET
Speaker:
Mathilde Karsenti, Siemens EDA, US
M03.1 Introduction to Embedded Inferencing, Survey of Acceleration Techniques
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 09:15 CET - 09:45 CET
Speaker:
Russell Klein, Siemens EDA, US
M03.2 Profiling Embedded System Execution
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 09:45 CET - 10:00 CET
Speaker:
Russell Klein, Siemens EDA, US
M02.2 Part 2: Computing with High-dimensional Vectors in Machine Learning
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 09:50 CET - 10:40 CET
Presenter:
Evgeny Osipov, Luleå University of Technology, SE
M03.3 High-Level Synthesis of Inferencing Algorithms
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 10:00 CET - 11:00 CET
Speaker:
Petri Solanti, Siemens EDA, DE
M02.3 Part 3: Computing with High-dimensional Vectors: From Efficient Classifiers to Efficient General AI
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 10:50 CET - 11:40 CET
Presenter:
Abbas Rahimi, IBM Research Zurich, CH
M03.4 Quantization of Inferencing Algorithm
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 11:15 CET - 11:45 CET
Speaker:
Petri Solanti, Siemens EDA, DE
M02.4 Part 4: Hardware for Computing with High-dimensional Vectors: State-of-the-Art, Challenges, and Perspectives
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 11:40 CET - 12:30 CET
Presenter:
Jan M. Rabaey, University of California, Berkeley, US
M03.5 Power, Performance, and Area optimization of Inferencing Accelerator
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 11:45 CET - 12:15 CET
Speaker:
Russell Klein, Siemens EDA, US
M02.5 Part 5: Panel discussion
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 12:30 CET - 13:00 CET
Panellists:
Abbas Rahimi, IBM Research Zurich, CH
Denis Kleyko, University of California, Berkeley, US and Research Institutes of Sweden, Kista, SE
Evgeny Osipov, Luleå University of Technology, SE
Jan M. Rabaey, University of California, Berkeley, US
M03.6 Verification of Accelerator
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 12:30 CET - 12:50 CET
Speaker:
Petri Solanti, Siemens EDA, DE
M03.7 Question and Answer Period
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 12:50 CET - 13:00 CET
Panellists:
Russell Klein, Siemens EDA, US
Petri Solanti, Siemens EDA, DE
Mathilde Karsenti, Siemens EDA, US
M01 Using Organic Printed Electronics PDK (OPDK) to design circuits for Integrated Sensor Platforms
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 13:15 CET - 17:15 CET
Organisers:
Jasmin Aghassi, Karlsruhe Institute of Technology, DE
Anton Klotz, Cadence Design Systems, DE
Josef Mittermaier, Cadence Design Systems, DE
Mark Willoughby, Europractice, GB
Kai Exner, BASF, DE
Karl-Philipp Strunk, Innovation Lab, DE
Speakers:
Palak Gupta, Innovation Lab, DE
Justas Lukosiunas, Cadence Design Systems, DE
Kai Exner, BASF, DE
Sebastian Stehlin, Innovation Lab, DE
Gabriel Cadilha Marques, Karlsruhe Institute of Technology, DE
Motivation
Organic printed electronics is a rapidly emerging industrial field with numerous applications such as Internet-of-Things (IoT), wearable devices and others. However, to further aid this growth, a strong need in robust automated electronic design tools has been recognized.
As a part of the 2-HORISONS project (an international collaboration between Innovation Lab, Karlsruhe Institute of Technology, University of Heidelberg, BASF, Cadence Design Systems, Centre for advanced soft electronics (South Korea) and Nextflex (USA)), the cluster partners have been collaborating to create a comprehensive process design kit for printed organic electronics (OPDK).
To showcase current functionality and features of the OPDK, we would like to demonstrate how to use the full front-to-back design flow to craft manufacturable standard logic circuits.
Necessary background
Basic knowledge of analog and digital circuits as well as printed electronics would be appreciated.
Additionally, familiarity with the Cadence Virtuoso design platform will be helpful but is not strictly required.
Content
The tutorial will consist of 3 main parts:
Part 1 – General overview & introduction to materials, processes and modelling
Speakers: Dr. Kai Exner, Dr. Sebastian Stehlin, Dr. Gabriel Cadilha Marques
Duration: 1hr
The targeted technology platform relies on inkjet and screen-printed electrical components, such as resistors, thin-film transistors and sensors which can be modularly employed to realize electronic circuitry. The manufacturing process on flexible foil substrates utilizes polymer-based p-type semiconductors, dielectric materials for insulation and multiple metal layers to construct device terminals and interconnections.
To enable circuit simulation, test structures are fabricated and characterized and device models (DC, AC [2] and variation [3]) are extracted thereof.
Section 1: Introduction to materials and manufacturing processes
Section 2: Special features of the printed electronics technology
Section 3: Modelling approach
Part 2 - Basic OPDK usage and design of a standard cell (inverter)
Speakers: Justas Lukosiunas & Palak Gupta
Duration: 1.5hr
Part 2 will cover all the basics of the Cadence Virtuoso based design environment available to design an inverter circuit from start to finish. The full-front-to-back design flow will be exercised. This includes schematic entry, pre-layout simulations, layout creation, Design Rule Check (DRC) & Layout Versus Schematic (LVS) checks, extraction of parasitics and post-layout simulation. This part will be split into a presentation and a hands-on section.
Section 1: Introduction to the tools and front-to-back design flow
Section 2: (Hands-on) Cloud-based practical session for creating an inverter circuit
Part 3 – Designing a binary decoder circuit as part of readout circuitry for an active sensor matrix
Speakers: Justas Lukosiunas & Palak Gupta & Dr.. Sebastian Stehlin
Duration: 1hr
Part 3 will focus on using existing standard cells to form a more complex logic circuit – a binary decoder. This design is a critical part in readout circuitry for an active sensor matrix. The proposed schematic architecture will be simulated. On the layout side, important points on floorplanning, cell placement and routing will be discussed.
Section 1: Motivation: Large area active matrix sensor platforms
Section 2: Schematic & pre-layout simulation
Section 3: Layout creation, parasitic extraction & post-layout simulation
Agenda (all times in CET):
13:15 - 13:25 Tutorial Welcome Coffee
13:25 - 13:45 Part 1, Section 1 - Dr. Kai Exner
13:45 - 14:05 Part 1, Section 2 - Dr. Sebastian Stehlin
14:05 - 14:25 Part 1, Section 3 - Dr. Gabriel Cadilha Marques
14:25 - 14:35 Coffee Break
14:35 - 15:05 Part 2, Section 1 - Justas Lukosiunas & Palak Gupta
15:05 - 16:05 Part 2, Section 2 - Justas Lukosiunas & Palak Gupta (Hands-on)
16:05 - 16:15 Coffee Break
16:15 - 16:25 Part 3, Section 1 - Dr. Sebastian Stehlin
16:25 - 16:50 Part 3, Section 2 - Justas Lukosiunas & Palak Gupta
16:50 - 17:15 Part 3, Section 3 - Justas Lukosiunas & Palak Gupta
The current version of the OPDK can be downloaded here: https://www.int.kit.edu/7730.php
M04 Modern High-Level Synthesis for Complex Data Science Applications
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 13:15 CET - 17:15 CET
Organisers:
Serena Curzel, Pacific Northwest National Laboratory, US and Politecnico di Milano, IT
Nicolas Bohm Agostini, Pacific Northwest National Laboratory and Northeastern University, US
Michele Fiorito, Politecnico di Milano, IT
Marco Minutoli, Pacific Northwest National Laboratory, US
Vito Giovanni Castellana, Pacific Northwest National Laboratory, US
Fabrizio Ferrandi, Politecnico di Milano, IT
Antonino Tumeo, Pacific Northwest National Laboratory, US
Motivations
- Data science is a key application area that benefits from domain specific accelerators. However, domain scientists program in high-level programming frameworks, develop new algorithms extremely quickly, while implementing specialized accelerators typically requires significant efforts from experienced hardware designers
- Providing multi-level, modular, extendible, compiler based frameworks able to automate the translation of the algorithms from the high-level frameworks into specialized circuit design can bridge the design productivity gap
- High-Level compiler frameworks and High-Level Synthesis have a critical role in such a toolchain
Goals
The goals of this tutorial is to introduce participants to challenges and opportunities in the implementation of high-level productive programming frameworks to silicon compilers, and providing a hands-on on a set of opensource state-of-the-art tools (SODA-OPT and PandA-Bambu HLS) whose integration enables such a no-human-in-the-loop compilation framework.
Technical Details
Data Science applications (machine learning, graph analytics) today are the main drivers for designing domain-specific accelerators, both for reconfigurable devices such as Field Programmable Gate Arrays (FPGAs) and Application-Specific Integrated Circuits (ASICs). As data analysis and machine learning methods keep evolving, we are experiencing a renewed interest in high-level synthesis (HLS) and automated accelerator generation to reduce development effort and allow quick transition from the algorithmic formulation to hardware implementation. This tutorial will discuss the use of modern HLS techniques to generate domain-specific accelerators, explicitly focusing on accelerators for data science, highlighting key methodologies, trends, advantages, benefits, and gaps that still need to be closed. The tutorial will provide a direct hands-on experience with Bambu, one of the most advanced open-source HLS tools currently available, and SODA-OPT, an open-source frontend tool for HLS developed in MLIR. Bambu supports many logic synthesis and simulation tools by integrating various compiler frontends, generating accelerators targeting a variety of FPGA devices and ASIC flows, and introducing new methodologies for parallel accelerators (dataflow and multithreaded designs). SODA-OPT performs hardware/software partitioning of specifications derived from popular high-level data science and machine learning Python frameworks used in high-level data-driven applications. Additionally, it provides domain-specific optimizations to improve the high-level synthesis process of the identified hardware components. Integrating SODA-OPT with Bambu allows the generation of highly efficient accelerators for complex graph analysis and machine learning algorithms.
M04.1 Agile Hardware Design for Complex Data Science Applications: Opportunities and Challenges.
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 13:15 CET - 13:45 CET
Speaker:
Antonino Tumeo, Pacific Northwest National Laboratory, US
M04.2 Bambu: an Open-Source Research Framework for the High-Level Synthesis of Complex Applications.
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 13:45 CET - 14:15 CET
Speaker:
Fabrizio Ferrandi, Politecnico di Milano, IT
M04.3 Hands-on: Productive High-Level Synthesis with Bambu
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 14:15 CET - 15:00 CET
Speaker:
Serena Curzel, Pacific Northwest National Laboratory, US and Politecnico di Milano, IT
M04.4 Hands-on: Compiler Based Optimizations, Tuning and Customization of Generated Accelerators
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 15:15 CET - 16:00 CET
Speaker:
Michele Fiorito, Politecnico di Milano, IT
M04.5 Hands-on: SODA-OPT: Enabling System-Level Design in MLIR for High-Level Synthesis and Beyond
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 16:00 CET - 16:45 CET
Speaker:
Nicolas Bohm Agostini, Pacific Northwest National Laboratory and Northeastern University, US
M04.6 Tech: Svelto: High-Level Synthesis of Multi-Threaded Accelerators for Graph Analytics
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 16:45 CET - 17:15 CET
Speakers:
Marco Minutoli, Pacific Northwest National Laboratory, US
Vito Giovanni Castellana, Pacific Northwest National Laboratory, US
M04.1 Agile Hardware Design for Complex Data Science Applications: Opportunities and Challenges.
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 13:15 CET - 13:45 CET
Speaker:
Antonino Tumeo, Pacific Northwest National Laboratory, US
M05 Security of quantum computing
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 13:15 CET - 17:15 CET
Organisers:
Swaroop Ghosh, Pennsylvania State University, US
Rasit Topaloglu, IBM, US
Quantum bits (qubits) are prone to errors such as relaxation/dephasing, gate error, readout error, and crosstalk. These noise sources (e.g., crosstalk) create a new attack surface (e.g., fault injection and information leakage), especially for future large-scale quantum computers that may employ multi-programming access. Furthermore, success of quantum computing relies on efficient compilation of quantum programs. New but untrusted compilers could become more efficient motivating the designers to utilize their services. The untrusted compilers can then steal sensitive intellectual property embedded within quantum programs. This tutorial will provide the basics of quantum computing using hands-on activities on Qiskit. This will be followed by in depth analysis of vulnerabilities of quantum computers (both superconducting and trapped-ion), attack models and countermeasures. Various demonstrations and hands-on activities will reinforce the theoretical concepts on vulnerabilities, attack models and countermeasures.
M05.1 Basics of quantum computing
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 13:15 CET - 13:40 CET
Speaker:
Swaroop Ghosh, Pennsylvania State University, US
M05.2 Hands-on activity using Qiskit
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 13:40 CET - 14:05 CET
Speaker:
Swaroop Ghosh, Pennsylvania State University, US
M05.3 Fault injection attacks on quantum computing and countermeasures
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 14:05 CET - 14:45 CET
Speaker:
Swaroop Ghosh, Pennsylvania State University, US
M05.4 Demonstration of fault injection attack
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 14:45 CET - 15:00 CET
Speaker:
Swaroop Ghosh, Pennsylvania State University, US
M05.5 Compilation oriented attacks on quantum computing and countermeasures
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 15:15 CET - 16:00 CET
Speaker:
Swaroop Ghosh, Pennsylvania State University, US
M05.6 Quantum PUF and TRNG
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 16:00 CET - 16:45 CET
Speakers:
Swaroop Ghosh, Pennsylvania State University, US
Rasit Topaloglu, IBM, US
M05.7 Discussion and wrap up
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 16:45 CET - 17:00 CET
Speakers:
Swaroop Ghosh, Pennsylvania State University, US
Rasit Topaloglu, IBM, US
M05.1 Basics of quantum computing
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 13:15 CET - 13:40 CET
Speaker:
Swaroop Ghosh, Pennsylvania State University, US
M06 Approximate Computing: Circuits, Systems and Applications
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 13:15 CET - 17:15 CET
Location / Room: Virtual
Organisers:
Weiqiang Liu, Nanjing University of Aeronautics and Astronautics, CN
Jie Han, University of Alberta, CA
Alberto Bosio, Lyon Institute of Nanotechnology, FR
Fabrizio Lombardi, Northeastern University, US
Motivation:
Approximate computing has been proposed as a novel paradigm for efficient and low power design at nanoscales. Its efficiency in operation is related to the computation of approximate results with at least comparable performance and lower power dissipation compared to the fully accurate counterpart. Therefore, approximate computing generates results that are good enough rather than always fully accurate. Although computational errors generally are not desirable, applications such as multimedia, signal processing, machine learning, pattern recognition, and data mining are tolerant to the occurrence of some errors.
Approximate computing has received significant attention from both research and industrial communities in the past few years due to the challenge of designing power efficient computing circuits/systems for most emerging applications. The EDA research community has investigated approximate techniques at different levels; many papers on approximate computing have been published in DATE, DAC, ICCAD and ASP-DAC, as well as IEEE and ACM periodicals (such as TCAD) as well as in sessions of past DATE venues. In this tutorial, we present a comprehensive treatment of approximate computing. Starting from arithmetic circuits and modules, the tutorial expands its technical coverage to emerging and safety critical applications which the audience can appreciate the potential benefits of this new computational paradigm and its implications on circuit design.
Tutorial Abstract:
Computing systems are conventionally designed to operate as accurately as possible. However, this trend faces severe technology challenges, such as power dissipation, circuit reliability, and performance. There are a number of pervasive computing applications (such as machine learning, pattern recognition, digital signal processing, communication, robotics, and multimedia), which are inherently error-tolerant or error-resilient, i.e., in general, they require acceptable results rather than fully exact results. Approximate computing has been proposed for highly energy-efficient systems targeting the above-mentioned emerging error-tolerant applications; approximate computing consists of approximately (inexactly) processing data to save power and achieve high performance, while results remain at an acceptable level for subsequent use. This tutorial starts with the motivation of approximate computing and then it reviews current techniques for approximate hardware designs. This tutorial will cover the following topics:
- Approximate Computing: Introduction and Principles (by Fabrizio Lombardi)
- Characterization of Approximate Arithmetic Circuits (by Jie Han)
- Error Compensation Techniques and Approximate DSP Modules (by Fabrizio Lombardi)
- Machine Learning Applications and Security (by Weiqiang Liu)
- Approximate Computing for Safety-Critical Applications (by Alberto Bosio)
Directions for future works in approximate computing will also be provided. This tutorial will be presented and tailored to the EDA community and its technical interests.
Schedule:
13:15-13:20 Start of the Tutorial
13:20 - 13:45 Topic 1 - Fabrizio Lombardi
Title: Approximate Computing: Introduction and Principles
13:45 - 14:35 - Topic 2 - Jie Han
Title: Characterization of Approximate Arithmetic Circuits.
14:35 - 15:00 Topic 3 - Fabrizio Lombardi.
Title: Error Compensation Techniques and Approximate DSP Modules.
15:00 - 15:30 Coffee Break
15:30 - 16:20 Topic 4 - Weiqiang Liu.
Title: Approximate Computing: Machine Learning Applications and Security.
16:20 - 17:10 Topic 5 - Alberto Bosio.
Title: Approximate Computing for Safety-Critical Applications.
17:10-17:15 End of the Tutorial
Necessary background:
Knowledge of Computer Architecture and Computer Arithmetic
Acknowledgement:
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 956090.
M05.2 Hands-on activity using Qiskit
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 13:40 CET - 14:05 CET
Speaker:
Swaroop Ghosh, Pennsylvania State University, US
M04.2 Bambu: an Open-Source Research Framework for the High-Level Synthesis of Complex Applications.
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 13:45 CET - 14:15 CET
Speaker:
Fabrizio Ferrandi, Politecnico di Milano, IT
M05.3 Fault injection attacks on quantum computing and countermeasures
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 14:05 CET - 14:45 CET
Speaker:
Swaroop Ghosh, Pennsylvania State University, US
M04.3 Hands-on: Productive High-Level Synthesis with Bambu
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 14:15 CET - 15:00 CET
Speaker:
Serena Curzel, Pacific Northwest National Laboratory, US and Politecnico di Milano, IT
M05.4 Demonstration of fault injection attack
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 14:45 CET - 15:00 CET
Speaker:
Swaroop Ghosh, Pennsylvania State University, US
M04.4 Hands-on: Compiler Based Optimizations, Tuning and Customization of Generated Accelerators
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 15:15 CET - 16:00 CET
Speaker:
Michele Fiorito, Politecnico di Milano, IT
M05.5 Compilation oriented attacks on quantum computing and countermeasures
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 15:15 CET - 16:00 CET
Speaker:
Swaroop Ghosh, Pennsylvania State University, US
M04.5 Hands-on: SODA-OPT: Enabling System-Level Design in MLIR for High-Level Synthesis and Beyond
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 16:00 CET - 16:45 CET
Speaker:
Nicolas Bohm Agostini, Pacific Northwest National Laboratory and Northeastern University, US
M05.6 Quantum PUF and TRNG
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 16:00 CET - 16:45 CET
Speakers:
Swaroop Ghosh, Pennsylvania State University, US
Rasit Topaloglu, IBM, US
M04.6 Tech: Svelto: High-Level Synthesis of Multi-Threaded Accelerators for Graph Analytics
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 16:45 CET - 17:15 CET
Speakers:
Marco Minutoli, Pacific Northwest National Laboratory, US
Vito Giovanni Castellana, Pacific Northwest National Laboratory, US
M05.7 Discussion and wrap up
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 16:45 CET - 17:00 CET
Speakers:
Swaroop Ghosh, Pennsylvania State University, US
Rasit Topaloglu, IBM, US
A.2 Disruptive and Nanoelectronics-based edge AI computing systems
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 17:30 CET - 19:00 CET
Session chair:
David Atienza, EPFL, CH
Session co-chair:
Ayse Coskun, Boston University, US
Progress in process technology has enabled the miniaturization of data processing elements, radio transceivers, and sensors for a large set of physiological phenomena. Autonomous sensor nodes, or also called edge computing systems, can monitor and react unobtrusively during our daily lives. Nonetheless, the need for automated analysis and interpretation of complex signals poses critical design challenges, which can be potentially addressed (regarding either power consumption, performance, or size) by using nanoelectronics. These new technologies can enable us to go beyond key limitations on CMOS-based technology for particular applications, such as, healthcare. This special session covers the latest trends towards including AI/ML on edge computing , as well as alternative design paradigms and using nanoelectronics technologies for the next generation of edge AI systems.
Time | Label | Presentation Title Authors |
---|---|---|
17:30 CET | A.2.1 | TINY MACHINE LEARNING FOR IOT 2.0 Speaker and Author: Vijay Janapa Reddi, Harvard University, US Abstract Tiny machine learning (TinyML) is a fast-growing field at the intersection of ML algorithms and low-cost embedded systems. TinyML enables a rich and wide array of on-device sensor data analysis (vision, audio, IMU, etc.) at ultra-low-power consumption. Processing data close to the sensor allows for an expansive new variety of always-on ML use-cases that preserve bandwidth, latency, and energy while improving responsiveness and maintaining data privacy. This talk introduces the vision behind TinyML and showcases some of the exciting applications that TinyML is enabling in the field, from supporting personalized health initiatives to unlocking the massive potential to improve manufacturing efficiencies. Yet, there are still numerous technical hardware and software challenges to address. Tight memory and storage constraints, extreme hardware heterogeneity, software fragmentation and a lack of relevant and commercially viable large-scale datasets pose a substantial barrier to unlocking TinyML for IoT 2.0. To this end, the talk also touches on the opportunities and future directions for unlocking the full potential of TinyML. |
18:00 CET | A.2.2 | HD COMPUTING WITH APPLICATIONS Speaker and Author: Tajana S. Rosing, UCSD, US Abstract Hyperdimensional (HD) computing is a class of brain-inspired learning algorithms that uses high dimensional random vectors (e.g. ~10,000 bits) to represent data along with simple and highly parallelizable operations. In this talk I will present some of my team’s recent work on hyperdimensional computing software and hardware infrastructure, including: i) novel algorithms supporting key cognitive computations in high-dimensional space, ii) novel HW systems for efficient HD computing on sensors and mobile devices that are orders of magnitude more efficient that state of the art, at comparable accuracy. |
18:30 CET | A.2.3 | IMPROVING WEIGHT PERTURBATION ROBUSTNESS FOR MEMRISTOR-BASED HARDWARE DEPLOYMENT Speaker: Yiran Chen, Duke University, US Authors: Yiran Chen, Huanrui Yang and Xiaoxuan Yang, Duke University, US Abstract Crossbar-based memristors, owing to the advantages in executing vector-matrix multiplication, enable highly power-efficient and area-efficient neuromorphic system designs. However, deploying deep learning applications on memristor-based neuromorphic computing devices may lead to noticeable programming and runtime noises on the deployed model’s parameters, resulting in a significant performance degradation. In this talk, we will discuss algorithmic and system solutions to improve robustness of memristor-based designs. We tackle this problem by modeling the distribution of parameter noise and accounting for it in the model training process. More generally, we derive a theoretical robustness guarantee against weight perturbation from curvature perspective, leading to general robustness against hardware noise, quantization noise, and generalization noise. |
16.1 Young People Program Keynote: "Engineering skills that will advance quantum computing"
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 19:00 CET - 19:45 CET
Session chair:
Sara Vinco, Politecnico di Torino, IT
Session co-chair:
Anton Klotz, Cadence Design Systems, DE
Quantum computing is a computing paradigm that exploits fundamental principles of quantum mechanics to tackle problems in mathematics, chemistry, and material science that require particularly extensive computational resources. Its power is derived from a quantum bit (qubit), a physical system that can be in a superposition state and entangled with other qubits. Quantum computing is the main driver behind the phenomenal development of selected areas in electronic engineering (such as cryogenic CMOS), computer sciences, machine learning, material sciences, etc. A number of challenges to create practical quantum computers are associated with engineering challenges. In this talk, we would like to discuss the what challenges quantum computing has brought to the fields of electronic and computer engineering. We would also like to discuss quantum engineering and what skills are required to begin a career in quantum engineering.
Speaker's bio: Elena Blokhina (Senior Member, IEEE) received the M.Sc. degree in physics and the Ph.D. degree in physical and mathematical sciences from Saratov State University, Russia, in 2002 and 2006, respectively, and the Habilitation HDR degree in electronic engineering from UPMC Sorbonne Universities, France, in 2017. Since 2007, she has been with University College Dublin where she is currently Associate Professor. Since 2019, she has also been with Equal1 Labs, where she is CTO. Her current research interests focus on the theory, modelling and characterisation of semiconductor quantum devices, quantum computing, modelling and simulations of nonlinear systems and multi-physics simulations. Prof Blokhina had been elected to serve as a member of the Boards of Governors of the IEEE Circuits and Systems Society form 2013 to 2015 and was re-elected for the term 2015 to 2017. She has served as the Programme Co-Chair and General Co-Chair of multiple editions of the IEEE International Conference on Electronics, Circuits and System and the IEEE International Symposium on Integrated Circuits and Systems. From 2016 to 2017, she was an Associate Editor for IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, and in 2018-2021 she was the Deputy Editor in Chief of that journal. She has served as a member of organizing committees, review and programme committees, a session chair, and a track chair at many leading international conferences on microelectronic circuits and systems and device physics.
Robert Bogdan Staszewski (Fellow, IEEE) received the B.Sc. (summa cum laude), M.Sc., and Ph.D. degrees in electrical engineering from The University of Texas at Dallas, Richardson, TX, USA, in 1991, 1992, and 2002, respectively. From 1991 to 1995, he was with Alcatel Network Systems, Richardson, Texas, involved in SONET cross-connect systems for fiber optics communica- tions. He joined Texas Instruments Incorporated, Dallas, TX, USA, in 1995, where he was elected as a Distinguished Member of Technical Staff (limited to 2% of technical staff). From 1995 to 1999, he was engaged in advanced CMOS read channel development for hard disk drives. In 1999, he co-started the Digital RF Processor (DRP) group within Texas Instruments with a mission to invent new digitally intensive approaches to traditional RF functions for integrated radios in deeply-scaled CMOS technology. He was appointed as a CTO of the DRP Group from 2007 to 2009. In 2009, he joined Delft University of Technology, Delft, The Netherlands, where he currently holds a guest appointment of a Full Professor (Antoni van Leeuwenhoek Hoogleraar). Since 2014, he has been a Full Professor with University College Dublin (UCD), Dublin, Ireland. He is also a Co-Founder of a startup company, Equal1 Labs, with design centers located in Silicon Valley and Dublin, Ireland, aiming to produce single-chip CMOS quantum computers. He has authored or coauthored five books, seven book chapters, 140 journals and 210 conference publications, and holds 210 issued U.S. patents. His research interests include nanoscale CMOS architectures and circuits for frequency synthesizers, transmitters and receivers, and quantum computers. Prof. Staszewski was a recipient of the 2012 IEEE Circuits and Systems Industrial Pioneer Award. In May 2019, he received the title of Professor from the President of the Republic of Poland. He was also the TPC Chair of the 2019 European Solid-State Circuits Conference (ESSCIRC), Krakow, Poland.
Time | Label | Presentation Title Authors |
---|---|---|
19:00 CET | 16.1.1 | ENGINEERING SKILLS THAT WILL ADVANCE QUANTUM COMPUTING Speaker and Authors: Elena Blokhina and Robert Staszewski, University College Dublin, IE Abstract Quantum computing is a computing paradigm that exploits fundamental principles of quantum mechanics to tackle problems in mathematics, chemistry, and material science that require particularly extensive computational resources. Its power is derived from a quantum bit (qubit), a physical system that can be in a superposition state and entangled with other qubits. Quantum computing is the main driver behind the phenomenal development of selected areas in electronic engineering (such as cryogenic CMOS), computer sciences, machine learning, material sciences, etc. A number of challenges to create practical quantum computers are associated with engineering challenges. In this talk, we would like to discuss the what challenges quantum computing has brought to the fields of electronic and computer engineering. We would also like to discuss quantum engineering and what skills are required to begin a career in quantum engineering. |
16.2 Young People Program Panel
Add this session to my calendar
Date: Monday, 21 March 2022
Time: 19:45 CET - 20:30 CET
Session chair:
Anton Klotz, Cadence, DE
Session co-chair:
Xavier Salazar, Barcelona Supercomputing Center & HiPEAC, ES
Panellists:
Antonia Schmalz, SPRIN D.org, DE
Ari Kulmala, Tampere University, FI
Anna Puig-Centelles, HADEA, ES
Alba Cervera, Barcelona Supercomputing Center, ES
The session will feature a round table discussion with different views and opportunities in computer science high-end research and careers. Speakers with heterogeneous backgrounds and positions have been invited to give their insights and valuable knowledge on these different paths.
IP.2_1 Interactive presentations
Add this session to my calendar
Date: Tuesday, 22 March 2022
Time: 11:30 CET - 12:15 CET
Interactive Presentations (IPs) run simultaneously during a 45-minute slot. Authors of IPs are available to present their work and answer to questions throughout the session.
Label | Presentation Title Authors |
---|---|
IP.2_1.1 | (Best Paper Award Candidate) G-GPU: A FULLY-AUTOMATED GENERATOR OF GPU-LIKE ASIC ACCELERATORS Speaker: TIAGO DIADAMI PEREZ, Tallinn University of Technology (TalTech), EE Authors: Tiago Diadami Perez1, Márcio Gonçalves2, José Rodrigo Azambuja2, Leonardo Gobatto2, Marcelo Brandalero3 and Samuel Pagliarini1 1Tallinn University of Technology (TalTech), EE; 2UFRGS, BR; 3Brandenburg University of Technology, DE Abstract Modern Systems on Chip (SoC), almost as a rule, require accelerators for achieving energy efficiency and high performance for specific tasks that are not necessarily well suited for execution in standard processing units. Considering the broad range of applications and necessity for specialization, the design of SoCs has thus become expressively more challenging. In this paper, we put forward the concept of G-GPU, a general-purpose GPU-like accelerator that is not application-specific but still gives benefits in energy efficiency and throughput. Furthermore, we have identified an existing gap for these accelerators in ASIC, for which no known automated generation platform/tool exists. Our solution, called GPUPlanner, is an open-source generator of accelerators, from RTL to GDSII, that addresses this gap. Our analysis results show that our automatically generated G-GPU designs are remarkably efficient when compared against the popular CPU architecture RISC-V, presenting speed-ups of up to 223 times in raw performance and up to 11 times when the metric is performance derated by area. These results are achieved by executing a design space exploration of the GPU-like accelerators, where the memory hierarchy is broken in a smart fashion and the logic is pipelined on demand. Finally, tapeout-ready layouts of the G-GPU in 65nm CMOS are presented. |
IP.2_1.2 | (Best Paper Award Candidate) EFFICIENT TRAVELING SALESMAN PROBLEM SOLVERS USING THE ISING MODEL WITH SIMULATED BIFURCATION Speaker: Tingting Zhang, University of Alberta, CA Authors: Tingting Zhang and Jie Han, University of Alberta, CA Abstract An Ising model-based solver has shown efficiency in obtaining suboptimal solutions for combinatorial optimization problems. As an NP-hard problem, the traveling salesman problem (TSP) plays an important role in various routing and scheduling applications. However, the execution speed and solution quality significantly deteriorate using a solver with simulated annealing (SA) due to the quadratically increasing number of spins and strong constraints placed on the spins. The ballistic simulated bifurcation (bSB) algorithm utilizes the signs of Kerr-nonlinear parametric oscillators’ positions as the spins’ states. It can update the states in parallel to alleviate the time explosion problem. In this paper, we propose an efficient method for solving TSPs by using the Ising model with bSB. Firstly, the TSP is mapped to an Ising model without external magnetic fields by introducing a redundant spin. Secondly, various evolution strategies for the introduced position and different dynamic configurations of the time step are considered to improve the efficiency in solving TSPs. The effectiveness is specifically discussed and evaluated by comparing the solution quality to SA. Experiments on benchmark datasets show that the proposed bSB-based TSP solvers offer superior performance in solution quality and achieve a significant speed up in runtime than recent SA-based ones. |
IP.2_1.3 | (Best Paper Award Candidate) PROVIDING RESPONSE TIMES GUARANTEES FOR MIXED-CRITICALITY NETWORK SLICING IN 5G Speaker: Andrea Nota, TU Dortmund, DE Authors: Andrea Nota, Selma Saidi, Dennis Overbeck, Fabian Kurtz and Christian Wietfeld, TU Dortmund, DE Abstract Mission critical applications in domains such as Industry 4.0, autonomous vehicles or Smart Grids are increasingly dependent on flexible, yet highly reliable communication systems. In this context, Fifth Generation of mobile Communication Networks (5G) promises to support mixed-criticality applications on a single unified physical communication network. This is achieved by a novel approach known as network slicing, that promises to fulfil diverging requirements while providing strict separation between network tenants. We focus in this work on hard performance guarantees by formalizing an analytical method for bounding response times in mixed-criticality 5G network slicing. We reduce pessimism considering models on workload variations. |
IP.2_2 Interactive presentations
Add this session to my calendar
Date: Tuesday, 22 March 2022
Time: 11:30 CET - 12:15 CET
Interactive Presentations (IPs) run simultaneously during a 45-minute slot. Authors of IPs are available to present their work and answer to questions throughout the session.
Label | Presentation Title Authors |
---|---|
IP.2_2.1 | (Best Paper Award Candidate) SCI-FI: CONTROL SIGNAL, CODE, AND CONTROL FLOW INTEGRITY AGAINST FAULT INJECTION ATTACKS Speaker: Thomas Chamelot, University Grenoble Alpes, CEA, List, FR Authors: Thomas Chamelot1, Damien Couroussé1 and Karine Heydemann2 1University Grenoble Alpes, CEA, LIST, FR; 2Sorbonne Université, CNRS, FR Abstract Fault injection attacks have become a serious threat against embedded systems. Recently, Laurent et al. have reported that some faults inside the microarchitecture escape all typical software fault models and so software counter-measures. Moreover, state-of-the-art counter-measures, hardware-only or with hardware support, do not consider the integrity of microarchitectural control signals that are the target of these faults. We present SCI-FI, a counter-measure for Control Signal, Code, and Control-Flow Integrity against Fault Injection attacks. SCI-FI combines the protection of pipeline control signals with a fine-grained code and control-flow integrity mechanism, and can additionally provide code authentication. We evaluate SCI-FI by extending a RISC-V core. The average hardware area overheads range from 6.5% to 23.8%, and the average code size and execution time increase by 25.4% and 17.5% respectively. |
IP.2_2.2 | XTENSTORE: FAST SHIELDED IN-MEMORY KEY-VALUE STORE ON A HYBRID X86-FPGA SYSTEM Speaker: Hyungon Moon, UNIST, KR Authors: Hyunyoung Oh1, Dongil Hwang2, Maja Malenko3, Myunghyun Cho2, Hyungon Moon4, Marcel Baunach3 and Yunheung Paek2 1Seoul National University, KR; 2Dept. of Electrical and Computer Engineering and Inter-University Semiconductor Research Center (ISRC), Seoul National University, KR; 3Graz University of Technology, AT; 4UNIST, KR Abstract We propose XtenStore, a system that extends the existing SGX-based secure in-memory key-value store with an external hardware accelerator in order to ensure comparable security guarantees with lower performance degradation. The accelerator is implemented on a commodity FPGA card that is readily connected with the x86 CPU via PCIe interconnect to form a hybrid x86-FPGA system. In comparison to the prior SGX-based work, XtenStore improves the throughput by 4-33x, and exhibits considerably shorter tail latency (>23x, 99th-percentile). |
IP.2_2.3 | LEARNING TO MITIGATE ROWHAMMER ATTACKS Speaker: Biresh Kumar Joardar, Duke University, US Authors: Biresh Kumar Joardar, Tyler Bletsch and Krishnendu Chakrabarty, Duke University, US Abstract Rowhammer is a security vulnerability that arises due to the undesirable electrical interaction between physically adjacent rows in DRAMs. Rowhammer attacks cause bit flips in the neighboring rows by repeatedly accessing (hammering) a DRAM row. This phenomenon has been exploited to craft many types of attacks in platforms ranging from edge devices to datacenter servers. Existing DRAM protections using error-correction codes and targeted row refresh are not adequate for defending against Rowhammer attacks. In this work, we propose a Rowhammer-detection solution using machine learning (ML). Experimental evaluation shows that the proposed technique can reliably detect different types of Rowhammer attacks (both real and artificially engineered) and prevent bit flips. Moreover, the ML model introduces less power and performance overheads on average compared to two recently proposed Rowhammer mitigation techniques, namely Graphene and Blockhammer, for 26 different applications from the Parsec, Pampar, and Splash-2 benchmark suites |
IP.2_3 Interactive presentations
Add this session to my calendar
Date: Tuesday, 22 March 2022
Time: 11:30 CET - 12:15 CET
Interactive Presentations (IPs) run simultaneously during a 45-minute slot. Authors of IPs are available to present their work and answer to questions throughout the session.
Label | Presentation Title Authors |
---|---|
IP.2_3.1 | ONCE FOR ALL SKIP: EFFICIENT ADAPTIVE DEEP NEURAL NETWORKS Speaker: Yu Yang, Yunnan University, CN Authors: Yu Yang, Di Liu, Hui Fang, Yi-Xiong Huang, Ying Sun and Zhi-Yuan Zhang, Yunnan University, CN Abstract In this paper, we propose a new module, namely extit{once for all skip} (OFAS), for adaptive deep neural networks to efficiently control the block skip within a DNN model. The novelty of OFAS is that it only needs to compute once for all skippable blocks to determine their execution states. Moreover, since adaptive DNN models with OFAS cannot achieve the best accuracy and efficiency in end-to-end training, we propose a reinforcement learning-based training method to enhance the training procedure. The experimental results with different models and datasets demonstrate the effectiveness and efficiency in comparison to the state of the arts. The code is available at url{https://github.com/ieslab-ynu/OFAS}. |
IP.2_3.2 | SELF-AWARE MIMO BEAMFORMING SYSTEMS: DYNAMIC ADAPTATION TO CHANNEL CONDITIONS AND MANUFACTURING VARIABILITY Speaker: Suhasini Komarraju, Georgia Institute of Technology, US Authors: Suhasini Komarraju and Abhijit Chatterjee, Georgia Institute of Technology, US Abstract Emerging wireless technologies employ MIMO beamforming antenna arrays to improve channel Signal-to-Noise Ratio (SNR). The increased dynamic range of channel SNR values that can be accommodated, creates power stress on Radio Frequency (RF) electronic circuitry. To alleviate this, we propose an approach in which the circuitry along with other transmission coding parameters can be dynamically tuned in response to channel SNR and beam-steering angle to either minimize power consumption or maximize throughput in the presence of manufacturing process variations while meeting a specified Bit Error Rate (BER) limit. The adaptation control policy is learned online and is facilitated by information obtained from testing of the RF circuitry before deployment. |
IP.2_3.3 | SALVAGING RUNTIME BAD BLOCKS BY SKIPPING BAD PAGES FOR IMPROVING SSD PERFORMANCE Speaker: Mincheol Kang, KAIST, KR Authors: Junoh Moon, Mincheol Kang, Wonyoung Lee and Soontae Kim, KAIST, KR Abstract Recent research has revealed that runtime bad blocks are found in the early lifespan of solid state drives. The reduction in overprovisioning space due to runtime bad blocks may well have a negative impact on performance as it weakens the chances of selecting a better victim block during garbage collection. Moreover, previous studies focused on reusing worn- out bad blocks exceeding a program/erase cycle threshold, leaving the problem of runtime bad blocks unaddressed. Based on this observation, we present a salvation scheme for runtime bad blocks. This paper reveals that these blocks can be identified when a page write fails at runtime. Furthermore, we introduce a method to salvage functioning pages from runtime bad blocks. Consequently, the loss in the overprovisioning space can be minimized even after the occurrence of runtime bad blocks. Experimental results show a 26.3% reduction in latency and a 25.6% increase in throughput compared to the baseline at a conservative bad block ratio of 0.45%. Additionally, our results confirm that almost no overhead was observed. |
IP.2_4 Interactive presentations
Add this session to my calendar
Date: Tuesday, 22 March 2022
Time: 11:30 CET - 12:15 CET
Interactive Presentations (IPs) run simultaneously during a 45-minute slot. Authors of IPs are available to present their work and answer to questions throughout the session.
Label | Presentation Title Authors |
---|---|
IP.2_4.1 | SACC: SPLIT AND COMBINE APPROACH TO REDUCE THE OFF-CHIP MEMORY ACCESSES OF LSTM ACCELERATORS Speaker: Saurabh Tewari, Indian Institute Of Technology, IN Authors: Saurabh Tewari1, Anshul Kumar2 and Kolin Paul3 1I.I.T.Delhi, IN; 2I.I.T. Delhi, IN; 3IIT Delhi, IN Abstract Long Short-Term Memory (LSTM) networks are widely used in speech recognition and natural language processing. Recently, a large number of LSTM accelerators have been proposed for the efficient processing of LSTM networks. The high energy consumption of these accelerators limits their usage in energy-constrained systems. LSTM accelerators repeatedly access large weight matrices from off-chip memory, significantly contributing to energy consumption. Reducing off-chip memory access is the key to improving the energy efficiency of these accelerators. We propose a data reuse approach that splits and combines the LSTM cell computations in a way that reduces the off-chip memory accesses of LSTM hidden state matrices by 50%. In addition, the data reuse efficiency of our approach is independent of on-chip memory size, making it more suitable for small on-chip memory LSTM accelerators. Experimental results show that our approach reduces off-chip memory access by 28% and 32%, and energy consumption by 13% and 16%, respectively, compared to conventional approaches for character level Language Modelling and Speech Recognition LSTM models. |
IP.2_4.2 | NPU-ACCELERATED IMITATION LEARNING FOR THERMAL- AND QOS-AWARE OPTIMIZATION OF HETEROGENEOUS MULTI-CORES Speaker: Martin Rapp, Karlsruhe Institute of Technology, DE Authors: Martin Rapp1, Nikita Krohmer1, Heba Khdr1 and Joerg Henkel2 1Karlsruhe Institute of Technology, DE; 2Karlsruhe institute of technology, DE Abstract Task migration and dynamic voltage and frequency scaling (DVFS) are indispensable means in thermal optimization of a heterogeneous clustered multi-core processor under user-defined quality of service (QoS) targets. However, selecting the core to execute each application and the voltage/frequency (V/f) levels of each cluster is a complex problem because 1) the diverse characteristics and QoS targets of applications require different optimizations, and 2) V/f levels are often shared between cores on a cluster, which requires a global optimization considering all running applications. State-of-the-art techniques for power or temperature minimization either rely on measurements that are often not available (such as power) or fail to consider all the dimensions of the problem (e.g., by using simplified analytical models). Imitation learning (IL) enables to use the optimality of an oracle policy, yet at low run-time overhead, by training a model from oracle demonstrations. We are the first to employ IL for temperature minimization under QoS targets. We tackle the complexity by using a neural network (NN) model and accelerate the NN inference using a neural processing unit (NPU). While such NN accelerators are becoming increasingly widespread on end devices, they are so far only used to accelerate user applications. In contrast, we use an accelerator on a real platform to accelerate NN-based resource management. Our evaluation on a HiKey970 board with an Arm big.LITTLE CPU and an NPU shows significant temperature reductions at a negligible overhead while satisfying QoS targets. |
IP.2_4.3 | BMPQ: BIT-GRADIENT SENSITIVITY DRIVEN MIXED-PRECISION QUANTIZATION OF DNNS FROM SCRATCH Speaker: Souvik Kundu, University of Southern California, US Authors: Souvik Kundu1, Shikai Wang2, Qirui Sun2, Peter Beerel2 and Massoud Pedram1 1USC, US; 2University of Southern California, US Abstract Large DNNs with mixed-precision quantization can achieve ultra-high compression while retaining high classification performance. However, because of the challenges in finding an accurate metric that can guide the optimization process, these methods either sacrifice significant performance compared to the 32-bit floating-point (FP-32) baseline or rely on compute expensive iterative training policy that requires the availability of a pre-trained baseline. To address this issue, this paper presents BMPQ, a training method that uses bit gradients to analyze layer sensitivities and yield mixed-precision quantized models. BMPQ requires a single training iteration but does not need a pre-trained baseline. It uses an integer linear program (ILP) to dynamically adjust the precision of layers during training subject to a fixed hardware budget. To evaluate the efficacy of BMPQ, we conduct extensive experiments with VGG16 and ResNet18 on CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets. Compared to the baseline FP-32 models, BMPQ can yield models that are 15.4× fewer parameter bits with a negligible drop in accuracy. Compared to the SOTA “during training” mixed-precision training scheme, our models are 2.1x, 2.2x, and 2.9x smaller, on CIFAR-10, CIFAR-100, and Tiny-ImageNet, respectively, with improved accuracy of up to 14.54%. We have open-sourced our trained models and test code for reproducibility. |
IP.2_5 Interactive presentations
Add this session to my calendar
Date: Tuesday, 22 March 2022
Time: 11:30 CET - 12:15 CET
Interactive Presentations (IPs) run simultaneously during a 45-minute slot. Authors of IPs are available to present their work and answer to questions throughout the session.
Label | Presentation Title Authors |
---|---|
IP.2_5.1 | EM SCA & FI SELF-AWARENESS AND RESILIENCE WITH SINGLE ON-CHIP LOOP & ML CLASSIFIERS Speaker: Archisman Ghosh, Purdue University, US Authors: Archisman Ghosh1, Debayan Das2, Santosh Ghosh2 and Shreyas Sen1 1Purdue University, US; 2Intel Corp., US Abstract Securing ICs is becoming increasingly challenging with improvements in electromagnetic (EM) side-channel analysis (SCA) and fault injection (FI) attacks. In this work, we develop a pro-active approach to detect and counter these attacks by embedding a single on-chip integrated loop around a crypto core (AES-256), designed and fabricated using TSMC 65nm process. The measured results demonstrate that the proposed system 1) provides EM-Self-awareness by acting as an on-chip H-field sensor, detecting voltage/clock glitching fault-attacks; 2) senses an approaching EM probe to detect an incoming threat, and 3) can be used to induce EM noise to increase resilience against EM attacks. |
IP.2_5.2 | RTSEC: AUTOMATED RTL CODE AUGMENTATION FOR HARDWARE SECURITY ENHANCEMENT Speaker: Orlando Arias, University of Florida, US Authors: Orlando Arias1, Zhaoxiang Liu2, Xiaolong Guo3, Yier Jin1 and Shuo Wang1 1University of Florida, US; 2Kansas State University, US; 3Electrical and Computer Engineering Department, Kansas State University, US Abstract Current hardware designs have increased in complexity, resulting in a reduced ability to perform security checks on them. Further, the addition of any security features to these designs is still largely manual which further complicates the design and integration process. In this paper, we address these shortcomings by introducing RTSec as a framework which is capable of performing security analysis on designs as well as integrating security features directly into the HDL code, a feature that commercial EDA tools do not provide. RTSec first breaks down HDL code into an Abstract Syntax Tree which is then used to infer the logic of the design. We demonstrate how RTSec can be utilized to automatically include security mechanisms in RTL designs: watermarking and logic locking. We also compare the efficacy of our analysis algorithms with state of the art tools, demonstrating that RTSec has capabilities equal or superior to those of state of the art tools while also providing the means of enhancing security features to the design. |
IP.2_5.3 | INTER-IP MALICIOUS MODIFICATION DETECTION THROUGH STATIC INFORMATION FLOW TRACKING Speaker: Zhaoxiang Liu, Kansas State University, CN Authors: Zhaoxiang Liu1, Orlando Arias2, Weimin Fu1, Yier Jin2 and Xiaolong Guo3 1Kansas State University, US; 2University of Florida, US; 3Electrical and Computer Engineering Department, Kansas State University, US Abstract To help expand the usage of formal methods in the hardware security domain. We propose a static register-transfer level (RTL) security analysis framework and an electronic design automation (EDA) tool named If-Tracker to support the proposed framework. Through this framework, a data-flow model will be automatically extracted from the RTL description of the SoC. Information flow security properties will then be generated. The tool checks all possible inter-IP paths to verify whether any property violations exist. The effectiveness of the proposed framework is demonstrated on customized SoC designs using AMBA bus where malicious modifications are inserted across multiple IPs. Existing IP level security analysis tools cannot detect such Trojans. Compared to commercial formal tools such as Cadence JasperGold and Synopsys VC-Formal, our framework provides a much simpler user interface and can identify more types of malicious modifications. |
IP.2_6 Interactive presentations
Add this session to my calendar
Date: Tuesday, 22 March 2022
Time: 11:30 CET - 12:15 CET
Interactive Presentations (IPs) run simultaneously during a 45-minute slot. Authors of IPs are available to present their work and answer to questions throughout the session.
Label | Presentation Title Authors |
---|---|
IP.2_6.1 | MANY-LAYER HOTSPOT DETECTION BY LAYER-ATTENTIONED VISUAL QUESTION ANSWERING Speaker: Yen-Shuo Chen, National Taiwan University, TW Authors: Yen-Shuo Chen and Iris Hui-Ru Jiang, National Taiwan University, TW Abstract Exploring hotspot patterns and correcting them as early as possible is crucial to guarantee yield and manufacturability. Hotspot patterns can be classified into various types according to potentially induced defects. In modern layouts, defects are caused by not only the geometry on one specific layer but also the accumulated influence from other layers. Existing hotspot detection and pattern classification methods, however, consider only the geometry on one single layer or one main layer with adjacent layers. They cannot recognize the corresponding defect type for a hotspot pattern, either. Therefore, in this paper, we investigate the linkage between many-layer hotspot patterns and corresponding potentially induced defect types. We first cast the many-layer critical hotspot pattern extraction task as a visual question answering (VQA) problem: Considering a many-layer layout pattern an image and a defect type a question, we devise a layer-attentioned VQA model to answer whether the pattern is critical to the queried defect type. Simply considering all layers equally may dilute the key features of hotspot patterns. Thus, our layer attention mechanism attempts to identify the importance and relevance of each layer for different types. Experimental results show that the proposed model has superior performance and question-answering ability based on modern layouts with more than thirty layout layers. |
IP.2_6.2 | RESTORE: REAL-TIME TASK SCHEDULING ON A TEMPERATURE AWARE FINFET BASED MULTICORE Speaker: Shounak Chakraborty, Department of Computer Science, Norwegian University of Science and Technology (NTNU), NO Authors: Yanshul Sharma1, Sanjay Moulik1 and Shounak Chakraborty2 1IIIT Guwahati, IN; 2Norwegian University of Science and Technology, NO Abstract In this work, we propose RESTORE that exploits the unique thermal feature of FinFET based multicore platforms, where processing speed increases with temperature, in the context of time-criticality to meet other design constraints of real-time systems. RESTORE is a temperature aware real-time scheduler for FinFET based multicore system that first derives a task-to-core allocation, and prepares a schedule. Next, it balances the performance and temperature on the fly by incorporating a prudential temperature cognizant voltage/frequency scaling while guaranteeing task deadlines. Simulation results claim, RESTORE is able to maintain a safe and stable thermal status (peak temperature below 80 °C), hence the frequency (3.7 GHz on an average), that ensures legitimate time-critical performance for a variety of workloads while surpassing state-of-the-arts. |
IP.2_6.3 | ONLINE PERFORMANCE AND POWER PREDICTION FOR EDGE TPU VIA COMPREHENSIVE CHARACTERIZATION Speaker: Yang Ni, University of California, Irvine, US Authors: Yang Ni1, Yeseong Kim2, Tajana S. Rosing3 and Mohsen Imani4 1University of California, Irvine, US; 2DGIST, KR; 3UCSD, US; 4University of California Irvine, US Abstract In this paper, we characterize and model the performance and power consumption of Edge TPU, which efficiently accelerates deep learning (DL) inference in a low-power environment. Systolic array, as a high throughput computation architecture, its usage in the edge excites our interest in its performance and power pattern. We perform an extensive study for various neural network settings and sizes using more than 10,000 DL models. Through comprehensive exploration, we profile which factors highly influence the inference time and power to run DL Models. We show our key remarks for the relation between the performance/power and DL model complexity to enable hardware-aware optimization and design decisions. For example, our measurement shows that energy/performance is not linearly-proportional to the number of MAC operations. In fact, as the computation and DL model size increase, the performance follows a stepped pattern. Hence, the accurate estimate should consider other features of DL models such as on-chip/off-chip memory usages. Based on the characterization, we propose a modeling framework, called PETET, which perform online predictions for the performance and power of Edge TPU. The proposed method automatically identifies the relationship of the performance, power, and memory usages to the DL model settings based on machine learning techniques. |
IP.2_7 Interactive presentations
Add this session to my calendar
Date: Tuesday, 22 March 2022
Time: 11:30 CET - 12:15 CET
Interactive Presentations (IPs) run simultaneously during a 45-minute slot. Authors of IPs are available to present their work and answer to questions throughout the session.
Label | Presentation Title Authors |
---|---|
IP.2_7.1 | PROACTIVE RUN-TIME MITIGATION FOR TIME-CRITICAL APPLICATIONS USING DYNAMIC SCENARIO METHODOLOGY Speaker: Ji-Yung Lin, IMEC, TW Authors: Ji-Yung Lin1, Pieter Weckx2, Subrat Mishra2, Alessio Spessot3 and Francky Catthoor2 1KU Leuven, BE; 2IMEC, BE; 3Imec, BE Abstract Energy saving is important for both high-end processors and battery-powered devices. However, for time-critical application such as car auto-driving systems and multimedia streaming, saving energy by slowing down speed poses a threat to timing guarantee of the applications. The worst-case execution time (WCET) method is a widespread solution to this problem, but its static execution time model is not sufficient anymore for highly dynamic hardware and applications nowadays. In this work, a fully proactive run-time mitigation methodology is proposed for energy saving while ensuring timing guarantee. This methodology introduces heterogeneous datapath options, a fast fine-grained knob which enables processors to switch between datapaths of different speed and energy levels with a switching time of only tens of clock cycles. In addition, a run-time controller using a dynamic scenario methodology is developed. This methodology incorporates execution time prediction and timing guarantee criteria calculation, so it can dynamically switch knobs for energy saving while rigorously still ensuring all timing guarantees. Simulation shows that the proposed methodology can mitigate a dynamic workload without any deadline misses, and at the same time energy can be saved. |
IP.2_7.2 | ANALYZING CAN'S TIMING UNDER PERIODICALLY AUTHENTICATED ENCRYPTION Speaker: Mingqing Zhang, TU Chemnitz, DE Authors: Mingqing Zhang1, Philip Parsch1, Henry Hoffmann2 and Alejandro Masrur1 1TU Chemnitz, DE; 2University of Chicago, US Abstract With increasing connectivity in the automotive domain, it has become easier to remotely access in-vehicle buses like CAN (Controller Area Network). This not only jeopardizes security, but it also exposes CAN's limitations. In particular, to reject replay and spoofing attacks, messages need to be authenticated, i.e., an authentication tag has to be included. As a result, messages become larger and need to be split in at least two frames due to CAN's restrictive payload. This increases the delay on the bus and, thus, some deadlines may start being missed compromising safety. In this paper, we propose a Periodically Authenticated Encryption (PAE) based on the observation that we do not need to send authentication tags with every single message on the bus, but only with a configurable frequency that allows meeting both safety and security requirements. Plausibility checks can then be used to detect whether non-authenticated messages sent in between two authenticated ones have been altered or are being replayed, e.g., the transmitted values exceed a given range or are not in accordance with previous ones. We extend CAN's known schedulability analysis to consider PAE and analyze its timing behavior based on an implementation on real hardware and on extensive simulations. |
IP.2_7.3 | TOWARDS ADC-LESS COMPUTE-IN-MEMORY ACCELERATORS FOR ENERGY EFFICIENT DEEP LEARNING Speaker: Utkarsh Saxena, Purdue University, US Authors: Utkarsh Saxena, Indranil Chakraborty and Kaushik Roy, Purdue University, US Abstract Compute-in-Memory (CiM) hardware has shown great potential in accelerating Deep Neural Networks (DNNs). However, most CiM accelerators for matrix vector multiplication rely on costly analog to digital converters (ADCs) which becomes a bottleneck in achieving high energy efficiency. In this work, we propose a hardware-software co-design approach to reduce the aforementioned ADC costs through partial-sum quantization. Specifically, we replace ADCs with 1-bit sense amplifiers and develop a quantization aware training methodology to compensate for the loss in representation ability. We show that the proposed ADC-less DNN model achieves 1.1x-9.6x reduction in energy consumption while maintaining accuracy within 1\% of the DNN model without partial-sum quantization. |
IP.MPP Multi-Partner Projects – Interactive Presentations
Add this session to my calendar
Date: Tuesday, 22 March 2022
Time: 11:30 CET - 12:15 CET
The session is dedicated to multi-partner innovative and high-tech research projects addressing the DATE 2022 topics. The types of collaboration covered are projects funded by EU schemes (H2020, ESA, EIC, MSCA, COST, etc.), nationally- and regionally-funded projects, collaborative research projects funded by industry. Depending on the stage of the project, the papers present the novelty of the project concepts, relevance the technical objectives to the DATE community, technical highlights of the project results and insights on the lessons learnt in the project or open bits until the end of the project. In particular, three interactive presentations cover concepts for the embedded FPGA tile of the European Processor Initiative, a training network view on approximate computing trade-offs, and an open-source RISC-V SoC with AI accelerator.
Label | Presentation Title Authors |
---|---|
IP.MPP.1 | TOWARDS RECONFIGURABLE ACCELERATORS IN HPC: DESIGNING A MULTIPURPOSE EFPGA TILE FOR HETEROGENEOUS SOCS Speaker: Juan Miguel de Haro Ruiz, Barcelona Supercomputing Center, ES Authors: Tim Hotfilter1, Juan Miguel de Haro Ruiz2, Fabian Kreß3, Carlos Alvarez4, Fabian Kempf3, Daniel Jimenez-Gonzalez5, Miquel Moreto2, imen baili6, Jesus Labarta2 and Juergen Becker3 1Karlsruhe institute of technology, DE; 2Barcelona Supercomputing Center, ES; 3Karlsruhe Institute of Technology, DE; 4Universitat Politècnica de Catalunya, ES; 5Universitat Politècnica de Catalunya - Barcelona Supercomputing Center, ES; 6Technical Product Marketing, FR Abstract The goal of modern high performance computing platforms is to combine low power consumption and high throughput. Within the European Processor Initiative (EPI), such an SoC platform to meet the novel exascale requirements is built and investigated. As part of this project, we introduce an embedded Field Programmable Gate Array (eFPGA), adding flexibility to accelerate various workloads. In this article, we show our approach to design the eFPGA tile that supports the EPI SoC. While eFPGAs are inherently reconfigurable, their initial design has to be determined for tape-out. The design space of the eFPGA is explored and evaluated with different configurations of two HPC workloads, covering control and dataflow heavy applications. As a result, we present a well-balanced eFPGA design that can host several use cases and potential future ones by allocating 1% of the total EPI SoC area. Finally, our simulation results of the architectures on the eFPGA show great performance improvements over their software counterparts. |
IP.MPP.2 | TOWARDS APPROXIMATE COMPUTING FOR ACHIEVING ENERGY VS. ACCURACY TRADE-OFFS Speaker: Jari Nurmi, Tampere University, FI Authors: Jari Nurmi and Aleksandr Ometov, Tampere University, FI Abstract Despite the recent advances in semiconductor technology and energy-aware system design, the overall energy consumption of computing and communication systems is rapidly growing. On the one hand, the pervasiveness of these technologies everywhere in the form of mobile devices, cyber-physical embedded systems, sensor networks, wearables, social media and context-awareness, intelligent machines, broadband cellular networks, Cloud computing, and Internet of Things (IoT) has drastically increased the demand for computing and communications. On the other hand, the user expectations on features and battery life of online devices are increasing all the time, and it creates another incentive for finding good trade-offs between performance and energy consumption. One of the opportunities to address this growing demand is to utilize an Approximate Computing approach through software and hardware design. The APROPOS project aims at finding the balance between accuracy and energy consumption, and this short paper provides an initial overview of the corresponding roadmap, as the project is still in the initial stage. |
IP.MPP.3 | THE SELENE DEEP LEARNING ACCELERATION FRAMEWORK FOR SAFETY-RELEVANT APPLICATIONS Speaker: Laura Medina, Universitat Politècnica de València, ES Authors: Laura Medina1, Salvador Carrión1, Pablo Cerezo2, Tomás Picornell1, Josè Flich3, Carles Hernandez1, Markel Sainz4, Michael Sandoval4, Charles-Alexis Lefebvre4, Martin Ronnback5, Martin Matschnig6, Matthias Wess6 and Herber Taucher6 1Universitat Politècnica de València, ES; 2Universidad Politécnica de Valencia, ES; 3Associate Professor, Universitat Politècnica de València, ES; 4Ikerlan Technology Research Centre, Basque Research and Technology Alliance (BRTA), ES; 5Cobham Gaisler, SE; 6Siemens Technology, DE Abstract The goal of the H2020 SELENE project is the development of a flexible computing platform for autonomous applications that includes built-in hardware support for safety. The SELENE computing platform is an open-source RISC-V heterogeneous multicore system-on-chip (SoC) that includes 6 NOEL-V RISC-V cores and artificial intelligence accelerators. In this paper, we describe the approach we have followed in the SELENE project to accelerate neural network inference processes. Our intermediate results show that both the FPGA and ASIC accelerators provide real-time inference performance for the analyzed network models at a resonable implementation cost. |
L.1 Panel on Quantum and Neuromorphic Computing: "What’s it like to be an Engineer for Emerging Computing Technologies?"
Add this session to my calendar
Date: Tuesday, 22 March 2022
Time: 12:30 CET - 14:00 CET
Session chair:
Anne Matsuura, Intel, US
Session co-chair:
Aida Todri Sanial, LIRMM, FR
Panellists:
Fernando Gonzalez Zalba, Quantum Motion Technologies, GB
Théophile Gonos, A.I. Mergence, FR
Robert Wille, Johannes Kepler University Linz, AT
In this session, we invite four neuromorphic and quantum engineers to share their experiences on becoming engineers and working for emerging computing technologies. After the presentations, the floor will be opened for discussions and exchange with the moderator and audience.
17.1 Brain- and Bio-inspired architectures and applications
Add this session to my calendar
Date: Tuesday, 22 March 2022
Time: 14:30 CET - 15:30 CET
Session chair:
Michael Niemier, Notre Dame University, US
Session co-chair:
François Rummens, CEA, FR
This session focuses on architectures and application in the context of biochips and neural networks. This includes the discussion on solutions for adaptive droplet routing and contamination-free switches for biochips. Another aspect is the combination of graph convolutional networks and processing in-memory. Spiking neural networks try to replicate brain-like behavior. This session shows, how this existing emerging technology can be combined with the concept of hyperdimensional computing and how the backpropagation through time approach can be applied more efficiently.
Time | Label | Presentation Title Authors |
---|---|---|
14:30 CET | 17.1.1 | (Best Paper Award Candidate) ADAPTIVE DROPLET ROUTING FOR MEDA BIOCHIPS VIA DEEP REINFORCEMENT LEARNING Speaker: Mahmoud Elfar, Duke University, US Authors: Mahmoud Elfar, Tung-Che Liang, Krishnendu Chakrabarty and Miroslav Pajic, Duke University, US Abstract Digital microfluidic biochips (DMFBs) based on a micro-electrode-dot-array (MEDA) architecture provide fine-grained control and sensing of droplets in real-time. However, excessive actuation of microelectrodes in MEDA biochips can lead to charge trapping during bioassay execution, causing the failure of microelectrodes and erroneous bioassay outcomes. A recently proposed enhancement to MEDA allows run-time measurement of microelectrode health information, thereby enabling synthesis of adaptive routing strategies for droplets. However, existing synthesis solutions are computationally infeasible for large MEDA biochips that have been commercialized. In this paper, we propose a synthesis framework for adaptive droplet routing in MEDA biochips via deep reinforcement learning (DRL). The framework utilizes the real-time microelectrode health feedback to synthesize droplet routes that proactively minimize the likelihood of charge trapping. We show how the adaptive routing strategies can be synthesized using DRL. We implement the DRL agent, the MEDA simulation environment, and the bioassay scheduler using the OpenAI Gym environment. Our framework obtains adaptive routing policies efficiently for COVID-19 testing protocols on large arrays that reflect the sizes of commercial MEDA biochips available in the marketplace, significantly increasing probabilities of successful bioassay completion compared to existing methods. |
14:34 CET | 17.1.2 | CONTAMINATION-FREE SWITCH DESIGN AND SYNTHESIS FOR MICROFLUIDIC LARGE-SCALE INTEGRATION Speaker: Duan Shen, TU Munich, DE Authors: Duan Shen, Yushen Zhang, Mengchu Li, Tsun-Ming Tseng and Ulf Schlichtmann, TU Munich, DE Abstract Microfluidic large-scale integration (mLSI) biochips have developed rapidly in recent decades. The gap between design efficiency and application complexity has led to a growing interest in mLSI design automation. The state-of-the-art design automation tools for mLSI focus on the simultaneous co-optimisation of the flow and control layers but neglect potential contamination between different fluid reagents and products. Microfluidic switches, as fluid routers at the intersection of flow paths, are especially prone to contamination. State-of-the-art tools design the switches as spines with junctions, which aggregate the contamination problem. In this work, we present a contamination-free microfluidic switch design and a synthesis method to generate application-specific switches that can be employed by physical design tools for mLSI. We also propose a scheduling and binding method to transport the fluids with least time and fewest resources. To reduce the number of pressure inlets, we consider pressure sharing between valves within the switch. Experimental results demonstrate that our methods show advantages in avoiding contamination and improving transportation efficiency over conventional methods. |
14:38 CET | 17.1.3 | EXPLOITING PARALLELISM WITH VERTEX-CLUSTERING IN PROCESSING-IN-MEMORY-BASED GCN ACCELERATORS Speaker: Yu Zhu, Tsinghua University, CN Authors: Yu Zhu, Zhenhua Zhu, Guohao Dai, Kai Zhong, Huazhong Yang and Yu Wang, Tsinghua University, CN Abstract Recently, Graph Convolutional Networks (GCNs) have shown powerful learning capabilities in graph processing tasks. Computing GCNs with conventional von Neumann architectures usually suffers from limited memory bandwidth due to the irregular memory access. Recent work has proposed Processing-In-Memory (PIM) architectures to overcome the bandwidth bottleneck in Convolutional Neural Networks (CNNs) by performing in-situ matrix-vector multiplication. However, the performance improvement and computation parallelism of existing CNN-oriented PIM architectures is hindered when performing GCNs because of the large scale and sparsity of graphs. To tackle these problems, this paper presents a parallelism enhancement framework for PIM-based GCN architectures. At the software level, we propose a fixed-point quantization method for GCNs, which reduces the PIM computation overhead with little accuracy loss. We also introduce the vertex clustering algorithm to the graph, minimizing the inter-cluster links and realizing cluster-level parallel computing on multi-core systems. At the hardware level, we design a Resistive Random Access Memory (RRAM) based multi-core PIM architecture for GCN, which supports the cluster-level parallelism. Besides, we propose a coarse-grained pipeline dataflow to cover the RRAM write costs and improve the GCN computation throughput. At the software/hardware interface level, we propose a PIM-aware GCN mapping strategy to achieve the optimal tradeoff between resource utilization and computation performance. We also propose edge dropping methods to reduce the inter-core communications with little accuracy loss. We evaluate our framework on typical datasets with multiple widely-used GCN models. Experimental results show that the proposed framework achieves 698x, 89x, and 41x speedup with 7108x, 255x, and 31x energy efficiency enhancement compared with CPUs, GPUs, and ASICs, respectively. |
14:42 CET | 17.1.4 | ACCELERATING SPATIOTEMPORAL SUPERVISED TRAINING OF LARGE-SCALE SPIKING NEURAL NETWORKS ON GPU Speaker: LING LIANG, University of California Santa Barbara, CN Authors: LING LIANG1, Zhaodong Chen1, Lei Deng2, Fengbin Tu1, Guoqi Li3 and Yuan Xie4 1UCSB, US; 2Thsinghua, CN; 3Tsinghua, CN; 4UCAB, US Abstract Spiking neural networks (SNNs) have great potential to achieve brain-like intelligence, however, it suffers low accuracy of conventional synaptic plasticity rules and low training efficiency on GPUs. Recently, the emerging backpropagation through time (BPTT) inspired learning algorithms bring new opportunities to boost the accuracy of SNNs, while training on GPUs still remains inefficient due to the complex spatiotemporal dynamics and huge memory consumption, which restricts the model exploration for SNNs and prevents the advance of neuromorphic computing. In this work, we build a framework to solve the inefficiency of BPTT-based SNN training on modern GPUs. To reduce the memory consumption, we optimize the dataflow by abandoning a part of intermediate data in the forward pass and recomputing them in the backward pass. Then, we customize kernel functions to accelerate the neural dynamics for all training stages. Finally, we provide a Pytorch interface to make our framework easy-to-deploy in real systems. Compared to vanilla Pytorch implementation, our framework can achieve up to 2.13x end-to-end speedup and consume only 0.41x peak memory on the CIFAR10 dataset. Moreover, for the distributed training on the large ImageNet dataset, we can achieve up to 1.81x end-to-end speedup and consume only 0.38xpeak memory. |
14:46 CET | 17.1.5 | HYPERSPIKE: HYPERDIMENSIONAL COMPUTING FOR MORE EFFICIENT AND ROBUST SPIKING NEURAL NETWORKS Speaker: Justin Morris, University of California, San Diego, US Authors: Justin Morris1, Hin Wai Lui2, Kenneth Stewart2, Behnam Khaleghi1, Anthony Thomas1, Thiago Marback1, Baris Aksanli3, Emre Neftci4 and Tajana S. Rosing5 1University of California, San Diego, US; 2University of California, Irvine, US; 3San Diego State University, US; 4UC Irvine, US; 5UCSD, US Abstract Today’s Machine Learning(ML) systems, especially those running in server farms running workloads such as Deep Neural Networks, which require billions of parameters and many hours to train a model, consume a significant amount of energy. To combat this, researchers have been focusing on new emerging neuromorphic computing models. Two of those models are Hyperdimensional Computing (HDC) and Spiking Neural Networks (SNNs), both with their own benefits. HDC has various desirable properties that other Machine Learning (ML) algorithms lack such as: robustness to noise in the system, simple operations, and high parallelism. SNNs are able to process event based signal data in an efficient manner. In this paper, we combine these two neuromorphic methods to create HyperSpike. We utilize a single SNN layer to first process the event based data and transform it into a more traditional feature vector that HDC can interpret. Then, an HDC classifier is used to enable more efficient classification as well as robustness to errors. We additionally test HyperSpike against different levels of bit error rates to experimentally show that HyperSpike is on average 31.5× more robust to errors than SNNs using other classifiers as the last layer. We also propose an ASIC accelerator for HyperSpike that provides a 10× speedup and 19.3× more energy efficiency over traditional SNN networks run on Loihi chips. |
14:50 CET | 17.1.6 | Q&A SESSION Authors: Michael Niemier1 and François Rummens2 1University of Notre Dame, US; 2CEA, FR Abstract Questions and answers with the authors |
17.2 Attacks on Secure and Trustworthy Systems
Add this session to my calendar
Date: Tuesday, 22 March 2022
Time: 14:30 CET - 15:30 CET
Session chair:
Emanuele Valea, CEA LIST, FR
Session co-chair:
Francesco Regazzoni, University of Amsterdam and Università della Svizzera italiana, CH
In the last two decades we have witnessed a massive development of devices containing different types of valuable assets. Moreover, the globalization of the semiconductor industry has led to new trust risks. This session includes 5 presentations proposing novel methods to bypass state-of-the-art countermeasures against security threats (Cache Timing Attack and Side-Channel Based CPU Disassembly) and trust threats (Hardware Trojans and overproduction)
Time | Label | Presentation Title Authors |
---|---|---|
14:30 CET | 17.2.1 | A DEEP-LEARNING APPROACH TO SIDE-CHANNEL BASED CPU DISASSEMBLY AT DESIGN TIME Speaker: Hedi Fendri, ALaRI, Universita della Svizzera italiana, CH Authors: Hedi Fendri1, Marco Macchetti2, Jerome Perrine2 and Mirjana Stojilovic3 1ALaRI, Universita della Svizzera italiana, CH; 2Kudelski Group, CH; 3EPFL, CH Abstract Side-channel CPU disassembly is a side-channel attack that allows an adversary to recover instructions executed by a processor. Not only does such an attack compromise code confidentiality, it can also reveal critical information on the system’s internals. Being easily accessible to a vast number of end users, modern embedded devices are highly vulnerable against disassembly attacks. To protect them, designers deploy countermeasures and verify their efficiency in security laboratories. Clearly, any vulnerability discovered at that point, after the integrated circuit has been manufactured, represents an important setback. In this paper, we address the above issues in two steps: Firstly, we design a framework that takes a design netlist and outputs simulated power side-channel traces, with the goal of assessing the vulnerability of the device at design time. Secondly, we propose a novel side-channel disassembler, based on multilayer perceptron and sparse dictionary learning for feature engineering. Experimental results on simulated and measured side-channel traces of two commercial RISC-V devices, both working on operating frequencies of at least 100 MHz, demonstrate that our disassembler can recognize CPU instructions with success rates of 96.01% and 93.16%, respectively. |
14:34 CET | 17.2.2 | (Best Paper Award Candidate) A CROSS-PLATFORM CACHE TIMING ATTACK FRAMEWORK VIA DEEP LEARNING Speaker: Ruyi Ding, Northeastern University, US Authors: Ruyi Ding, Ziyue Zhang, Xiang Zhang, Cheng Gongye, Yunsi Fei and A. Adam Ding, Northeastern University, US Abstract While deep learning methods have been adopted in power side-channel analysis, they have not been applied to cache timing attacks due to the limited dimension of cache timing data. This paper proposes a persistent cache monitor based on cache line flushing instructions, which runs concurrently to a victim execution and captures detailed memory access patterns in high-dimensional timing traces. We discover a new cache timing side-channel across both inclusive and non-inclusive caches, different from the traditional "Flush+Flush" timing leakage. We then propose a non-profiling differential deep learning analysis strategy to exploit the cache timing traces for key recovery. We further propose a framework for cross-platform cache timing attack via deep learning. Knowledge learned from profiling a common reference device can be transferred to build models to attack many other victim devices, even in different processor families. We take the OpenSSL AES-128 encryption algorithm as an example victim and deploy an asynchronous cache attack. We target three different devices from Intel, AMD, and ARM processors. We examine various scenarios for assigning the teacher role to one device and the student role to other devices and evaluate the cross-platform deep-learning attack framework. Experimental results show that this new attack is easily extendable to victim devices and is more effective than attacks without any prior knowledge. |
14:38 CET | 17.2.3 | DESIGN OF AI TROJANS FOR EVADING MACHINE LEARNING-BASED DETECTION OF HARDWARE TROJANS Speaker: Prabhat Mishra, University of Florida, US Authors: Zhixin Pan and Prabhat Mishra, University of Florida, US Abstract The globalized semiconductor supply chain significantly increases the risk of exposing System-on-Chip (SoC) designs to malicious implants, popularly known as hardware Trojans. Traditional simulation-based validation is unsuitable for detection of carefully-crafted hardware Trojans with extremely rare trigger conditions. While machine learning (ML) based Trojan detection approaches are promising due to their scalability as well as detection accuracy, ML methods themselves are vulnerable from Trojan attacks. In this paper, we propose a robust backdoor attack on ML-based HT detection algorithms to demonstrate this serious vulnerability. The proposed framework is able to design an AI Trojan and implant it inside the ML model that can be triggered by specific inputs. Experimental results demonstrate that the proposed AI Trojans can bypass state-of-the-art defense algorithms. Moreover, our approach provides a fast and cost-effective solution in achieving 100% attack success rate that significantly outperforms state-of-the art approaches based on adversarial attacks. |
14:42 CET | 17.2.4 | DIP LEARNING ON CAS-LOCK: USING DISTINGUISHING INPUT PATTERNS FOR ATTACKING LOGIC LOCKING Speaker: Akashdeep Saha, Indian Institute of Technology, Kharagpur, IN Authors: Akashdeep Saha1, Urbi Chatterjee2, Debdeep Mukhopadhyay3 and Rajat Subhra Chakraborty4 1Indian Institute of Technology, Kharagpur, IN; 2Indian Institute of Technology Kanpur, IN; 3Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, IN; 4Associate Professor, Computer Science and Engineering, IIT Kharagpur, IN Abstract The globalization of the integrated circuit (IC) manufacturing industry has lured the adversary to come up with numerous malicious activities in the IC supply chain. Logic locking has risen to prominence as a proactive defense strategy against such threats. CAS-Lock (proposed in CHES'20), is an advanced logic locking technique that harnesses the concept of single-point function in providing SAT-attack resiliency. It is claimed to be powerful and efficient enough in mitigating existing state-of-the-art attacks against logic locking techniques. Despite the security robustness of CAS-Lock as claimed by the authors, we expose a serious vulnerability and by exploiting the same we devise a novel attack algorithm against CAS-Lock. The proposed attack can not only reveal the correct key but also the exact AND/OR structure of the implemented CAS-Lock design along with all the key gates utilized in both the blocks of CAS-Lock. It simply relies on the externally observable Distinguishing Input Patterns (DIPs) pertaining to a carefully chosen key simulation of the locked design without the requirement of structural analysis of any kind of the locked netlist. Our attack is successful against various AND/OR cascaded-chain configurations of CAS-Lock and reports a 100% success rate in recovering the correct key. The proposed attack is capable of further revealing the exact AND/OR structure of the implemented CAS-Lock design along with all the key gates utilized in both the blocks of CAS-Lock. It has an attack complexity of O(n), where n denotes the number of DIPs obtained for an incorrect key simulation. |
14:46 CET | 17.2.5 | MUXLINK: CIRCUMVENTING LEARNING-RESILIENT MUX-LOCKING USING GRAPH NEURAL NETWORK-BASED LINK PREDICTION Speaker: Lilas Alrahis, New York University Abu Dhabi, AE Authors: Lilas Alrahis1, Satwik Patnaik2, Muhammad Shafique1 and Ozgur Sinanoglu1 1New York University Abu Dhabi, AE; 2Texas A&M University, US Abstract Logic locking has received considerable interest as a prominent technique for protecting the design intellectual property from untrusted entities, especially the foundry. Recently, machine learning (ML)-based attacks have questioned the security guarantees of logic locking, and have demonstrated considerable success in deciphering the secret key without relying on an oracle, hence, proving to be very useful for an adversary in the fab. Such ML-based attacks have triggered the development of learning-resilient locking techniques. The most advanced state-of-the-art deceptive MUX-based locking (D-MUX) and the symmetric MUX-based locking techniques have recently demonstrated resilience against existing ML-based attacks. Both defense techniques obfuscate the design by inserting key-controlled MUX logic, ensuring that all the secret inputs to the MUXes are equiprobable. In this work, we show that these techniques primarily introduce local and limited changes to the circuit without altering the global structure of the design. By leveraging this observation, we propose a novel graph neural network (GNN)-based link prediction attack, MuxLink, that successfully breaks both the D-MUX and symmetric MUX-locking techniques, relying only on the underlying structure of the locked design, i.e., in an oracle-less setting. Our trained GNN model learns the structure of the given circuit and the composition of gates around the non-obfuscated wires, thereby generating meaningful link embeddings that help decipher the secret inputs to the MUXes. The proposed MuxLink achieves key prediction accuracy and precision up to 100% on D-MUX and symmetric MUX-locked ISCAS-85 and ITC-99 benchmarks, fully unlocking the designs. We open-source MuxLink [1]. |
14:50 CET | 17.2.6 | Q&A SESSION Authors: Emanuele Valea1 and Francesco Regazzoni2 1CEA LIST, FR; 2University of Amsterdam and ALaRI - USI, CH Abstract Questions and answers with the authors |
17.3 Algorithmic techniques for efficient and robust ML hardware
Add this session to my calendar
Date: Tuesday, 22 March 2022
Time: 14:30 CET - 15:30 CET
Session chair:
Giulio Gambardella, Synopsys, IR
Session co-chair:
Tony Wu, Meta/Facebook, US
In this session we present results from 5 papers on algorithmic techniques for efficient and robust ML hardware. The first paper introduces a dynamic token-based compression technique for efficient acceleration of attention mechanism in DNN. Next paper sheds light on the negative effect that adversarial training has on resilience of deep neural networks (DNNs), and proposes a simple weight decay remedy for adversarially trained models to maintain adversarial robustness" to "fault resilience". The third paper proposes a joint variability- and quantization-aware DNN training algorithm and self-tuning strategy to overcome accuracy less in highly quantized analog PIM-based models. Our next paper presents a new training algorithm that converts deep neural networks to spiking neural networks with low latency and high spike sparsity, demonstrating 2.5-8X faster inference than prior SNN models. Finally, the last paper introduces a new technique proposed for zero-overhead ECC embedding in DNN models.
Time | Label | Presentation Title Authors |
---|---|---|
14:30 CET | 17.3.1 | (Best Paper Award Candidate) DTQATTEN: LEVERAGING DYNAMIC TOKEN-BASED QUANTIZATION FOR EFFICIENT ATTENTION ARCHITECTURE Speaker: Tao Yang, Shanghai Jiao Tong University, CN Authors: Tao Yang, Dongyue Li, Zhuoran Song, Yilong Zhao, Fangxin Liu, Zongwu Wang, Zhezhi He and Li Jiang, Shanghai Jiao Tong University, CN Abstract Models based on the attention mechanism, i.e. transformers, have shown extraordinary performance in Natural Language Processing tasks. However, their memory footprint, inference latency, and power consumption are still prohibitive for efficient inference at edge devices, even at data centers. To tackle this issue, we present an algorithm-architecture co-design with dynamic and mixed-precision quantization, DTQAtten. We present empirically that the tolerance to the noise varies from token to token in attention-based models. This finding leads us to quantize different tokens with mixed levels of bits. Thus, we design a compression framework that (i) dynamically quantizes tokens while they are forwarded in the models and (ii) jointly determines the ratio of each precision. Moreover, due to the dynamic mixed-precision tokens caused by our framework, previous matrix-multiplication accelerators (e.g. systolic array) cannot effectively exploit the benefit of the compressed attention computation. We thus design our accelerator with the Variable Speed Systolic-Array (VSSA) and propose an effective optimization strategy to alleviate the pipeline-stall problem in VSSA without hardware overhead. We conduct experiments with existing attention-based models, including BERT and GPT-2 on various language tasks. Our results show that DTQAtten outperforms the previous neural network accelerator Eyeriss by 13.12x in terms of speedup and 3.8x on average, in terms of energy-saving. Compared with the state-of-the-art attention accelerator SpAtten, our DTQAtten achieves at least 2.65x speedup and 3.38x energy efficiency improvement. |
14:34 CET | 17.3.2 | MIND THE SCALING FACTORS: RESILIENCE ANALYSIS OF QUANTIZED ADVERSARIALLY ROBUST CNNS Speaker: Nael Fasfous, TU Munich, DE Authors: Nael Fasfous1, Lukas Frickenstein2, Michael Neumeier1, Manoj Rohit Vemparala2, Alexander Frickenstein2, Emanuele Valpreda3, Maurizio Martina3 and Walter Stechele1 1TU Munich, DE; 2BMW Group, DE; 3Politecnico di Torino, IT Abstract As more deep learning algorithms enter safety-critical application domains, the importance of analyzing their resilience against hardware faults cannot be overstated. Most existing works focus on bit-flips in memory, fewer focus on compute errors, and almost none study the effect of hardware faults on adversarially trained convolutional neural networks (CNNs). In this work, we show that adversarially trained CNNs are more susceptible to failure due to hardware errors when compared to vanilla-trained models. We identify large differences in the quantization scaling factors of the CNNs which are resilient to hardware faults and those which are not. As adversarially trained CNNs learn robustness against input attack perturbations, their internal weight and activation distributions open a backdoor for injecting large magnitude hardware faults. We propose a simple weight decay remedy for adversarially trained models to maintain adversarial robustness and hardware resilience in the same CNN. We improve the fault resilience of an adversarially trained ResNet56 by 25% for large-scale bit-flip benchmarks on activation data while gaining slightly improved accuracy and adversarial robustness. |
14:38 CET | 17.3.3 | VARIABILITY-AWARE TRAINING AND SELF-TUNING OF HIGHLY QUANTIZED DNNS FOR ANALOG PIM Speaker: Zihao Deng, University of Texas at Austin, US Authors: Zihao Deng and Michael Orshansky, University of Texas at Austin, US Abstract DNNs deployed on analog processing in memory (PIM) architectures are subject to fabrication-time variability. We developed a new joint variability- and quantization-aware DNN training algorithm for highly quantized analog PIM-based models that is significantly more effective than prior work. It outperforms variability-oblivious and post-training quantized models on multiple computer vision datasets/models. For low-bitwidth models and high variation, the gain in accuracy is up to 35.7% for ResNet-18 over the best alternative. We demonstrate that, under a realistic pattern of within- and between-chip components of variability, training alone is unable to prevent large DNN accuracy loss (of up to 54% on CIFAR- 100/ResNet-18). We introduce a self-tuning DNN architecture that dynamically adjusts layer-wise activations during inference and is effective in reducing accuracy loss to below 10%. |
14:42 CET | 17.3.4 | CAN DEEP NEURAL NETWORKS BE CONVERTED TO ULTRA LOW-LATENCY SPIKING NEURAL NETWORKS? Speaker: Gourav Datta, University of Southern California, US Authors: Gourav Datta and Peter Beerel, University of Southern California, US Abstract Spiking neural networks (SNNs), that operate via binary spikes distributed over time, have emerged as a promising energy efficient ML paradigm for resource-constrained devices. However, the current state-of-the-art (SOTA) SNNs require multiple time steps for acceptable inference accuracy, increasing spiking activity and, consequently, energy consumption. SOTA training strategies for SNNs involve conversion from a non-spiking deep neural network (DNN). In this paper, we determine that SOTA conversion strategies cannot yield ultra low latency because they incorrectly assume that the DNN and SNN pre-activation values are uniformly distributed. We propose a new training algorithm that accurately captures these distributions, minimizing the error between the DNN and converted SNN. The resulting SNNs have ultra low latency and high activation sparsity, yielding significant improvements in compute efficiency. In particular, we evaluate our framework on image recognition tasks from CIFAR-10 and CIFAR-100 datasets on several VGG and ResNet architectures. We obtain top-1 accuracy of 64.19% with only 2 time steps on the CIFAR-100 dataset with 159.2x lower compute energy compared to an iso-architecture standard DNN. Compared to other SOTA SNN models, our models perform inference 2.5-8x faster (i.e., with fewer time steps). |
14:46 CET | 17.3.5 | VALUE-AWARE PARITY INSERTION ECC FOR FAULT-TOLERANT DEEP NEURAL NETWORK Speaker: Seo-Seok Lee, Samsung Electronics, KR Authors: Seo-Seok Lee1 and Joon-Sung Yang2 1Samsung Electronics Co.Ltd, KR; 2Yonsei University, KR Abstract Deep neural networks (DNNs) are deployed on hardware devices and are widely used in various fields to perform inference from inputs. Unfortunately, hardware devices can become unreliable by incidents such as unintended process, voltage and temperature variations, and this can introduce the occurrence of erroneous weights. Prior study reports that the erroneous weights can cause a significant accuracy degradation. In safety-critical applications such as autonomous driving, it can bring catastrophic results. Retraining or fine-tuning can be used to adjust corrupted weights to prevent the accuracy degradation. However, training-based approaches would incur a significant computational overhead due to a massive size of training datasets and intensive training operations. Thus, this paper proposes a value-aware parity insertion error correction code (ECC) to recover erroneous weights with a reduced parity storage overhead and no additional training processes. Previous ECC-based reliability improvement methods, Weight Nulling and In-place Zero-space ECC, are compared with the proposed method. Experimental results demonstrate that DNNs with the value-aware parity insertion ECC can perform inference without the accuracy degradation, on average, in 122.5x and 15.1x higher bit error rate conditions over Weight Nulling and In-place Zero-space ECC, respectively. |
14:50 CET | 17.3.6 | Q&A SESSION Authors: Giulio Gambardella1 and Tony Wu2 1Synopsys, IE; 2Meta/Facebook, US Abstract Questions and answers with the authors |
17.4 Energy Efficiency with Emerging Technologies for the Edge and the Cloud
Add this session to my calendar
Date: Tuesday, 22 March 2022
Time: 14:30 CET - 15:30 CET
Session chair:
Qinru Qiu, Syracuse University, US
Session co-chair:
Iraklis Anagnostopoulos, SIU, US
Papers in this session discuss approaches for energy reduction on edge devices and for the cloud by optimizing hardware and software architectures and memory management methodologies. The first paper presents a precision scalable architecture for edge-DNN accelarators. The second paper proposes an energy-efficient classification for an event-based vision sensor using ternary convolutional networks. The third paper address the read/write overheads of NVM based on an extensible hashing methodology. The fourth paper presents a new memory allocation technique based on data-structure refinement for NVM and DRAM hybrid systems. The fifth paper reduces cost of energy and ownership by replacing x86 based rack servers with a large number of ARM-based single-board computers for serverless Function-as-a-Service platforms.
Time | Label | Presentation Title Authors |
---|---|---|
14:30 CET | 17.4.1 | A PRECISION-SCALABLE ENERGY-EFFICIENT BIT-SPLIT-AND-COMBINATION VECTOR SYSTOLIC ACCELERATOR FOR NAS-OPTIMIZED DNNS ON EDGE Speaker: Junzhuo Zhou, Southern University of Science and Technology, CN Authors: Kai Li, Junzhuo Zhou, Yuhang Wang, Junyi Luo, Zhengke Yang, Shuxin Yang, Wei Mao, Mingqiang Huang and Hao Yu, Southern University of Science and Technology, CN Abstract Optimized model and energy-efficient hardware are both required for deep neural networks (DNNs) in edge-computing area. Neural architecture search (NAS) methods are employed for DNN model optimization with resulted multi-precision networks. Previous works have proposed low-precision-combination (LPC) and high-precision-split (HPS) methods for multi-precision networks, which are not energy-efficient for precision-scalable vector implementation. In this paper, a bit-split-and-combination (BSC) based vector systolic accelerator is developed for a precision-scalable energy-efficient convolution on edge. The maximum energy efficiency of the proposed BSC vector processing element (PE) is up to 1.95x higher in 2-bit, 4-bit and 8-bit operations when compared with LPC and HPS PEs. Further with NAS optimized multi-precision CNN networks, the averaged energy efficiency of the proposed vector systolic BSC PE array achieves up to 2.18x higher in 2-bit, 4-bit and 8-bit operations than that of LPC and HPS PE arrays. |
14:34 CET | 17.4.2 | TERNARIZED TCN FOR μJ/INFERENCE GESTURE RECOGNITION FROM DVS EVENT FRAMES Speaker: Georg Rutishauser, ETH Zürich, CH Authors: Georg Rutishauser1, Moritz Scherer1, Tim Fischer1 and Luca Benini2 1ETH Zürich, CH; 2Università di Bologna and ETH Zürich, IT Abstract Dynamic Vision Sensors (DVS) offer the opportunity to scale the energy consumption in image acquisition proportionally to the activity in the captured scene by only transmitting data when the captured image changes. Their potential for energy-proportional sensing makes them highly attractive for severely energy-constrained sensing nodes at the edge. Most approaches to the processing of DVS data employ Spiking Neural Networks to classify the input from the sensor. In this paper, we propose an alternative, event frame-based approach to the classification of DVS video data. We assemble ternary video frames from the event stream and process them with a fully ternarized Temporal Convolutional Network which can be mapped to CUTIE, a highly energy-efficient Ternary Neural Network accelerator. The network mapped to the accelerator achieves a classification accuracy of 94.5 %, matching the state of the art for embedded implementations. We implement the processing pipeline in a modern 22 nm FDX technology and perform post-synthesis power simulation of the network running on the system, achieving an inference energy of 1.7 μJ, which is 647× lower than previously reported results based on Spiking Neural Networks. |
14:38 CET | 17.4.3 | REH: REDESIGNING EXTENDIBLE HASHING FOR COMMERCIAL NON-VOLATILE MEMORY Speaker: Zhengtao Li, Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, CN Authors: Zhengtao Li, Zhipeng Tan and Jianxi Chen, Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, CN Abstract Emerging Non-volatile Memory (NVM) is attractive because of its byte-addressability, durability, and DRAM-scale latency. Hashing indexes have been extensively used to provide fast query services in the storage system. Recent research proposes crash-consistent and write-optimized hashing indexes for NVM. However, existing NVM-based hashing indexes suffer from limited scalability when running on a Commercial Non-Volatile Memory product, named Intel Optane DC Persistent Memory Module (DCPMM), due to the limited bandwidth of Optane DCPMM. To achieve a high load factor, existing NVM-based hashing indexes often evict an existing item to its alternative position, which incurs extra write and will consume the limited bandwidth. Moreover, the lock operations and metadata updates further saturate the limited bandwidth and prevent the hash table from scaling. In order to achieve scalability performance as well as a high load factor for the NVM-based hashing index, we design a new persistent hashing index, called REH, based on extendible hashing. REH (1) proposes a selective persistence scheme that stores buckets in NVM and places directory and metadata in DRAM to reduce both unnecessary NVM reads and writes, (2) uses 256B sized-buckets, as 256B is the internal data access size in Optane DCPMM, and the buckets are directly pointed to by directory entries, (3) leverages fingerprinting to further reduce unnecessary NVM reads, (4) employs failure-atomic bucket split to reduce bucket split overhead. Evaluations show that REH outperforms the state-of-the-art NVM-based hashing indexes by up to 1.68∼7.78×. In the meantime, REH can achieve a high load factor. |
14:42 CET | 17.4.4 | MEMORY MANAGEMENT METHODOLOGY FOR APPLICATION DATA STRUCTURE REFINEMENT AND PLACEMENT ON HETEROGENEOUS DRAM/NVM SYSTEMS Speaker: Manolis Katsaragakis, National TU Athens and KU Leuven, GR Authors: Manolis Katsaragakis1, Lazaros Papadopoulos2, Christos Baloukas2 and Dimitrios Soudris2 1National TU Athens and KU Leuven, GR; 2National TU Athens, GR Abstract The emergence of memory systems that combine multiple memory technologies with alternative performance and energy characteristics are becoming mainstream. Existing data placement strategies evolve to map application requirements to the underlying heterogeneous memory systems. In this work, we propose a memory management methodology that leverages a data structure refinement approach to improve data placement results, in terms of execution time and energy consumption. The methodology is evaluated on three machine learning algorithms deployed on various NVM technologies, both on emulated and on real DRAM/NVM systems. Results show execution time improvement up to 57% and energy consumption gains up to 41%. |
14:46 CET | 17.4.5 | MICROFAAS: ENERGY-EFFICIENT SERVERLESS ON BARE-METAL SINGLE-BOARD COMPUTERS Speaker: Anthony Byrne, Boston University, US Authors: Anthony Byrne1, Yanni Pang1, Allen Zou1, Shripad Nadgowda2 and Ayse Coskun1 1Boston University, US; 2IBM T.J. Watson Research Center, US Abstract Serverless function-as-a-service (FaaS) platforms offer a radically-new paradigm for cloud software development, yet the hardware infrastructure underlying these platforms is based on a decades-old design pattern. The rise of FaaS presents an opportunity to reimagine cloud infrastructure to be more energy-efficient, cost-effective, reliable, and secure. In this paper, we show how replacing handfuls of x86-based rack servers with hundreds of ARM-based single-board computers could lead to a virtualization-free, energy-proportional cloud that achieves this vision. We call our systematically-designed implementation MicroFaaS, and we conduct a thorough evaluation and cost analysis comparing MicroFaaS to a throughput-matched FaaS platform implemented in the style of conventional virtualization-based cloud systems. Our results show a 5.6x increase in energy efficiency and 34.2% decrease in total cost-of-ownership compared to our baseline. |
14:50 CET | 17.4.6 | Q&A SESSION Authors: Qinru Qiu1 and Iraklis Anagnostopoulos2 1Syracuse University, US; 2Southern Illinois University Carbondale, US Abstract Questions and answers with the authors |
17.5 Putting Place and Route research on the right track
Add this session to my calendar
Date: Tuesday, 22 March 2022
Time: 14:30 CET - 15:30 CET
Session chair:
Behjat Laleh, University of Calgary, CA
Session co-chair:
Jens Lienig, TU Dresden, DE
This session discusses how placement and routing can be done more efficiently. The first paper presents a global routing framework running on hybrid CPU-GPU platforms with a heterogeneous task scheduler achieving considerable speedup over sequential implementations and state-of-the-art routers. The second paper addresses the track assignment during detailed routing. The third and the fourth papers show that routing violations can be reduced if the root causes of the problems are tackled during the placement stage. The last paper brings us back to the use of GPU and CPU and discusses how they can be employed during legalization to reduce runtime.
Time | Label | Presentation Title Authors |
---|---|---|
14:30 CET | 17.5.1 | (Best Paper Award Candidate) FASTGR : GLOBAL ROUTING ON CPU-GPU WITH HETEROGENEOUS TASK GRAPH SCHEDULER Speaker: Siting Liu, The Chinese University of Hong Kong, HK Authors: Siting Liu1, Peiyu Liao1, Rui Zhang2, Zhitang Chen3, Wenlong Lv4, Yibo Lin5 and Bei Yu1 1The Chinese University of Hong Kong, HK; 2HiSilicon Technologies Co. Ltd., CN; 3Huawei Noah's Ark Lab, HK; 4Huawei Noah's Ark Lab, CN; 5Peking University, CN Abstract Routing is an essential step to integrated circuits (IC) design closure. With the rapid increase of design scales, routing has become the runtime bottleneck in the physical design flow. Thus, accelerating routing becomes a vital and urgent task for IC design automation. This paper proposes a global routing framework running on hybrid CPU-GPU platforms with a heterogeneous task scheduler and a GPU-accelerated pattern routing algorithm. We demonstrate that the task scheduler can lead to 2.307$ imes$ speedup compared with the widely-adopted batch-based parallelization strategy on CPU and the GPU-accelerated pattern routing algorithm can contribute to 10.877$ imes$ speedup over the sequential algorithm on CPU. Finally, the combined techniques can achieve 2.426$ imes$ speedup without quality degradation compared with the state-of-the-art global router. |
14:34 CET | 17.5.2 | TRADER: A PRACTICAL TRACK-ASSIGNMENT-BASED DETAILED ROUTER Speaker: Zhen Zhuang, Fuzhou University, CN Authors: Zhen Zhuang1, Genggeng Liu1, Tsung-Yi Ho2, Bei Yu2 and Wenzhong Guo1 1Fuzhou University, CN; 2The Chinese University of Hong Kong, HK Abstract As the last stage of VLSI routing, detailed routing should consider complicated design rules in order to meet the manufacturability of chips. With the continuous development of VLSI technology node, the design rules are changing and increasing which makes detailed routing a hard task. In this paper, we present a practical track-assignment-based detailed router to deal with the most representative design rules in modern designs. The proposed router consists of four major stages: (1) a graph-based track assignment algorithm is proposed to optimize the design rule violations of an entire die area; (2) an effective rip-up and reroute method is used to reduce the design rule violations in local regions; (3) a segment migration algorithm is proposed to reduce short violations; and (4) a stack via optimization technique is proposed to reduce minimum area violations. Practical benchmarks from 2019 ISPD contest are used to evaluate the proposed router. Compared with the state-of-the-art detailed router, Dr. CU 2.0, the number of violations can be reduced by up to 35.11% with an average reduction rate of 10.08%. The area of short can be reduced by up to 61.49% with an average reduction rate of 44.80%. |
14:38 CET | 17.5.3 | CR&P: AN EFFICIENT CO-OPERATION BETWEEN ROUTING AND PLACEMENT Speaker: Erfan Aghaeekiasaraee, University of Calgary, CA Authors: Erfan Aghaeekiasaraee1, Aysa Fakheri Tabrizi1, Tiago Fontana2, Renan Netto3, Sheiny Almeida3, Upma Gandhi1, Jose Guntzel3, David Westwick1 and Laleh Behjat1 1University of Calgary, CA; 2Federal University of Santa Catarina (UFSC), BR; 3Federal University of Santa Catarina, BR Abstract Placement and Routing (P&R) are two main steps of the physical design flow implementation. Traditionally, because of their complexity, these two steps are performed separately. But the implementation of the physical design in advanced technology nodes shows that the performance of these two steps is tied to each other. Therefore creating efficient co-operation between the routing and placement steps has become a hot topic in Electronic Design Automation (EDA). In this work, to achieve an efficient collaboration between the routing and placement engines, an iterative replacement and rerouting framework facilitated with an Integer Linear Programming (ILP)-based legalizer is proposed and tested on the ACM/IEEE International Symposium on Physical Design (ISPD) 2018 contest's benchmarks. Numerical results show that the proposed framework can improve detailed routing vias and wirelength by 2.06% and 0.14% on average in a reasonable runtime without adding new Design Rule Violations (DRVs). The proposed framework can be considered as an add-on to the physical design flow between global routing and detailed routing. |
14:42 CET | 17.5.4 | PIN ACCESSIBILITY-DRIVEN PLACEMENT OPTIMIZATION WITH ACCURATE AND COMPREHENSIVE PREDICTION MODEL Speaker: Suwan Kim, Seoul National University, KR Authors: Suwan Kim and Taewhan Kim, Seoul National University, KR Abstract The significantly increased density of pins of standard cells and the reduced number of routing tracks at sub-10nm nodes have made the pin access problem in detailed routing very difficult. To alleviate this pin accessibility problem in detailed routing, recent works have proposed to make a small perturbation of cell shifting, cell flipping, and adjacent cells swapping in the detailed placement stage. Here, an essential element for the success of pin accessibility aware detailed placement is the installed cost function, which should be sufficiently accurate in predicting the degree of routing difficulty in accessing pins. In this work, we propose a new model of cost function that is comprehensively devised to overcome the limitations of the prior ones. Precisely, unlike the conventional cost functions, our proposed cost function model is based on the empirical routing data in order to fully reflect the potential outcomes of detailed routing. Through experiments with benchmark circuits, it is shown that using our proposed cost function in detailed placement is able to reduce the routing errors by 44% on average while using the existing cost functions reduce the routing errors on average by at most 15%. |
14:46 CET | 17.5.5 | MIXED-CELL-HEIGHT LEGALIZATION ON CPU-GPU HETEROGENEOUS SYSTEMS Speaker: Haoyu Yang, NVIDIA Corp., US Authors: Haoyu Yang1, Kit Fung2, Yuxuan Zhao2, Yibo Lin3 and Bei Yu4 1NVIDIA Corp., US; 2Chinese University of Hong Kong, HK; 3Peking University, CN; 4The Chinese University of Hong Kong, HK Abstract Legalization conducts refinements on post-global-placement cell location to compromise design constraints and parameters. These include placement fence regions, power/ground rail alignments, timing, wire length and etc. In advanced technology nodes, designs can easily contain millions of mutiple-row standard cells, which challenges the scalability of modern legalization algorithms. In this paper, for the first time, we investigate dedicated legalization algorithms on heterogeneous platforms, which promises intelligent usage of CPU and GPU resources and hence provides new algorithm design methodologies for large scale physical design problems. Experimental results on IC/CAD 2017 and ISPD 2015 contest benchmarks demonstrate the effectiveness and the efficiency of the proposed algorithm, compared to the state-of-the-art legalization solution for mixed-cell-height designs. |
14:50 CET | 17.5.6 | Q&A SESSION Authors: Laleh Behjat1 and Jens Lienig2 1University of Calgary, CA; 2TU Dresden, DE Abstract Questions and answers with the authors |
17.6 Multi-Partner Projects – Session 1
Add this session to my calendar
Date: Tuesday, 22 March 2022
Time: 14:30 CET - 15:30 CET
Session chair:
Leticia Maria Bolzani Poehls, RWTH Aachen University, DE
Session co-chair:
Maksim Jenihhin, Tallinn UT, EE
The session is dedicated to multi-partner innovative and high-tech research projects addressing the DATE 2022 topics. The types of collaboration covered are projects funded by EU schemes (H2020, ESA, EIC, MSCA, COST, etc.), nationally- and regionally-funded projects, collaborative research projects funded by industry. Depending on the stage of the project, the papers present the novelty of the project concepts, relevance the technical objectives to the DATE community, technical highlights of the project results and insights on the lessons learnt in the project or open bits until the end of the project. In particular, this session discusses projects for automotive and safety-critical systems covering the security aspects, RISC-V architecture platforms and cross-layer concepts for reliability analysis.
Time | Label | Presentation Title Authors |
---|---|---|
14:30 CET | 17.6.1 | A COMPREHENSIVE SOLUTION FOR SECURING CONNECTED AND AUTONOMOUS VEHICLES Speaker: Theocharis Theocharides, KIOS Research and Innovation Center of Excellence, University of Cyprus, CY Authors: Mohsin Kamal1, Christos Kyrkou2, Nikos Piperigkos3, Andreas Papandreou3, Andreas Kloukiniotis3, Jordi Casademont4, Natalia Mateu5, Daniel Castillo5, Rodrigo Rodriguez6, Nicola Durante6, Peter Hofmann7, Petros Kapsalas8, Aris Lalos9, Konstantinos Moustakas9, Christos Laoudias1, Theocharis Theocharides2 and Georgios Ellinas10 1KIOS Research and Innovation Center of Excellence, University of Cyprus, CY; 2University of Cyprus, CY; 3Department of Electrical and Computer Engineering, University of Patras, Greece, GR; 4Universitat Politecnica de Catalunya and Fundacio i2CAT, Barcelona, ES; 5Nextium by Idneo, ES; 6Atos IT Solutions and Services Iberia S.L., Madrid, ES; 7Deutsche Telekom Security GmbH, T-Systems, Berlin, DE; 8Panasonic Automotive, Langen, DE; 9Department of Electrical and Computer Engineering, University of Patras, GR; 10Department of Electrical and Computer Engineering, University of Cyprus, Nicosia, CY Abstract With the advent of Connected and Autonomous Vehicles (CAVs) comes the very real risk that these vehicles will be exposed to cyber-attacks by exploiting various vulnerabilities. This paper gives a technical overview of the H2020 CARAMEL project (currently in the intermediate stage) in which Artificial Intelligent (AI)-based cybersecurity for CAVs is the main goal. Most of the possible scenarios are considered, by which an adversary can generate attacks on CAVs, such as attacks on camera sensors, GPS location, Vehicle to Everything (V2X) message transmission, the vehicle's On-Board Unit (OBU), etc. The counter-measures to these attacks and vulnerabilities are presented via the current results in the CARAMEL project achieved by implementing the designed security algorithms. |
14:34 CET | 17.6.2 | PHYSICAL AND FUNCTIONAL REVERSE ENGINEERING CHALLENGES FOR ADVANCED SEMICONDUCTOR SOLUTIONS Speaker: Bernhard Lippmann, Infineon, DE Authors: Bernhard Lippmann1, Matthias Ludwig1, Johannes Mutter1, Ann-Christin Bette1, Alexander Hepp2, Johanna Baehr2, Martin Rasche3, Oliver Kellermann3, Horst Gieser4, Tobias Zweifel4 and Nicola Kovacˇ4 1Infineon, DE; 2TU Munich, DE; 3RAITH, DE; 4Fraunhofer, DE Abstract Motivated by the threats of malicious modification and piracy arising from worldwide distributed supply chains, the goal of RESEC is the creation, verification, and optimization of a complete reverse engineering process for integrated circuits manufactured in technology nodes of 40 nm and below. Building upon the presentation of individual reverse engineering process stages, this paper connects analysis efforts and yields with their impact on hardware security, demonstrated on a design with implemented hardware Trojans. We outline the interim stage of our research activities and present our future targets linking chip design and physical verification processes. |
14:38 CET | 17.6.3 | DE-RISC: A COMPLETE RISC-V BASED SPACE-GRADE PLATFORM Speaker: Jaume Abella, Barcelona Supercomputing Center, ES Authors: Nils-Johan Wessman1, Fabio Malatesta1, Stefano Ribes1, Jan Andersson1, Antonio Garcia-Vilanova2, Miguel Masmano2, Vicente Nicolau2, Paco Gomez2, Jimmy Le Rhun3, Sergi Alcaide4, Guillem Cabo5, Francisco Bas4, Pedro Benedicte5, Fabio Mazzocchetti5 and Jaume Abella5 1CAES Gaisler, SE; 2fentISS, ES; 3Thales Research and Technology, FR; 4Universitat Politècnica de Catalunya - Barcelona Supercomputing Center, ES; 5Barcelona Supercomputing Center, ES Abstract The H2020 EIC-FTI De-RISC project develops a RISC-V space-grade platform to jointly respond to several emerging, as well as longstanding needs in the space domain such as: (1) higher performance than that of monocore and basic multicore space-grade processors in the market; (2) access to an increasingly rich software ecosystem rather than sticking to the slowly fading SPARC and PowerPC-based ones; (3) freedom (or drastic reduction) of export and license restrictions imposed by commercial ISAs such as ARM; and (4) improved support for the design and validation of safety-related real-time applications, (5) being the platform with software qualified and hardware designed per established space industry standards. De-RISC partners have set up the different layers of the platform during the first phases of the project. However, they have recently boosted integration and assessment activities. This paper introduces the De-RISC space platform, presents recent progress such as enabling virtualization and software qualification, new MPSoC features, and use case deployment and evaluation, including a comparison against other commercial platforms. Finally, this paper introduces the ongoing activities that will lead to the hardware and fully qualified software platform at TRL8 on FPGA by September 2022. |
14:42 CET | 17.6.4 | THE SCALE4EDGE RISC-V ECOSYSTEM Speaker: Wolfgang Ecker, Infineon Technologies AG, DE Authors: Wolfgang Ecker1, Milos Krstic2, Andreas Mauderer3, Eyck Jentzsch4, Andreas Koch5, Wolfgang Müller6, Vladimir Herdt7, Daniel Mueller-Gritschneder8, Rafael Stahl8, Kim Grüttner9, Jörg Bormann10, Wolfgang Kunz11, Reinhold Heckmann12, Ralf Wimmer13, Bernd Becker14, Philipp Scholl14, Oliver Bringmann15, Johannes Partzsch16 and Christian Mayr16 1Infineon Technologies AG, DE; 2IHP, DE; 3Robert Bosch GmbH, DE; 4MINRES Technologies GmbH, DE; 5TU Darmstadt, DE; 6Paderborn University, DE; 7University Bremen, DE; 8TU Munich, DE; 9OFFIS - Institute for Information Technology, DE; 10Siemens EDA, DE; 11TU Kaiserslautern, DE; 12AbsInt Angewandte Informatik GmbH, DE; 13Concept Engineering GmbH, DE; 14University of Freiburg, DE; 15University of Tuebingen / FZI, DE; 16TU Dresden, DE Abstract This paper introduces the project Scale4Edge. The project is focused on enabling an effective RISC-V ecosystem for optimization of edge applications. We describe the basic components of this ecosystem and introduce the envisioned demonstrators, which will be used in their evaluation. |
14:46 CET | 17.6.5 | XANDAR: EXPLOITING THE X-BY-CONSTRUCTION PARADIGM IN MODEL-BASED DEVELOPMENT OF SAFETY-CRITICAL SYSTEMS Speaker: Leonard Masing, Karlsruhe Institute of Technology, DE Authors: Leonard Masing1, Tobias Dörr1, Florian Schade2, Juergen Becker1, Georgios Keramidas3, Christos Antonopoulos3, Michail Mavropoulos3, Efstratios Tiganourias3, Vasilios Kelefouras3, Konstantinos Antonopoulos3, Nikolaos Voros3, Umut Durak4, Alexander Ahlbrecht4, Wanja Zaeske4, Christos Panagiotou5, Dimitris Karadimas5, Nico Adler6, Andreas Sailer6, Raphael Weber6, Thomas Wilhelm6, Geza Nemeth7, Fahad Siddiqui8, Rafiullah Khan8, Vahid Garousi8, Sakir Sezer8 and Victor Morales9 1Karlsruhe Institute of Technology, DE; 2Karlsruhe Intitute of Technology, DE; 3University of Peloponnese, GR; 4German Aerospace Center (DLR), DE; 5AVN Innovative Technology Solutions Limited, CY; 6Vector Informatik GmbH, DE; 7Bayerische Motoren Werke Aktiengesellschaft, DE; 8Queen’s University, Belfast, GB; 9fentISS, ES Abstract Realizing desired properties “by construction” is a highly appealing goal in the design of safety-critical embedded systems. As verification and validation tasks in this domain are often both challenging and time-consuming, the by-construction paradigm is a promising solution to increase design productivity and reduce design errors. In the XANDAR project, partners from industry and academia develop a toolchain that will advance current development processes by employing a modelbased X-by-Construction (XbC) approach. XANDAR defines a development process, metamodel extensions, a library of safety and security patterns, and investigates many further techniques for design automation, verification, and validation. The developed toolchain will use a hypervisor-based platform, targeting future centralized, AI-capable high-performance embedded processing systems. It is co-developed and validated in both an avionics use case for situation perception and pilot assistance as well as an automotive use case for autonomous driving. |
14:50 CET | 17.6.6 | FLODAM: CROSS-LAYER RELIABILITY ANALYSIS FLOW FOR COMPLEX HARDWARE DESIGNS Speaker: Angeliki Kritikakou, Univ Rennes, Inria, CNRS, IRISA, Pl Authors: Angeliki Kritikakou1, Olivier Sentieys2, Guillaume Hubert3, Youri Helen4, Jean-francois Coulon5 and Patrice Deroux-Dauphin5 1Univ Rennes, Inria, CNRS, IRISA, FR; 2INRIA, FR; 3ONERA, FR; 4DGA, FR; 5Temento, FR Abstract Modern technologies make hardware designs more and more sensitive to radiation particles and related faults. As a result, analysing the behavior of a system under radiation-induced faults has become an essential part of the system design process. Existing approaches either focus on analysing the radiation impact at the lower hardware design layers, without further propagating any radiation-induced fault to the system execution, or analyse system reliability at higher hardware or application layers, based on fault models that are agnostic of the fabrication technology and the radiation environment. Flodam combines the benefits of existing approaches by providing a novel cross-layer reliability analysis from the semiconductor layer up to the application layer, able to quantify the risks of faults under a given context, taking into account the environmental conditions, the physical hardware design and the application under study. |
14:54 CET | 17.6.7 | Q&A SESSION Authors: Leticia Maria Bolzani Poehls1 and Maksim Jenihhin2 1RWTH Aachen University, DE; 2Tallinn University of Technology, EE Abstract Questions and answers with the authors |
18.1 Domain-specific co-design: From sensors to graph analytics
Add this session to my calendar
Date: Tuesday, 22 March 2022
Time: 15:40 CET - 16:30 CET
Session chair:
Jeronimo Castrillon, TU Dresden, DE
Session co-chair:
Paula Herber, WWU Munster, DE
This session demonstrates how domain-specific knowledge can be leveraged to design algorithms and micro-architectures for an improved computational efficiency. The presentations touch upon CNN optimizations, custom vectorization for sparse solvers and cache-aware data management for graph analytics. For instance, authors exploit the structure of matrices in circuit simulations to better use modern vector instructions, propose highly energy-efficient architectures for spiking neural networks by modifying the order in which loops are processed, design predictors to reduce CNN operations at runtime, and improve the utilization of the memory subsystem by judiciously bypassing hierarchy levels for graph analytics. The session also describes a router design that exploits input and output patterns to train an generative adversarial networks to detect hardware trojans in router designs.
Time | Label | Presentation Title Authors |
---|---|---|
15:40 CET | 18.1.1 | SNE: AN ENERGY-PROPORTIONAL DIGITAL ACCELERATOR FOR SPARSE EVENT-BASED CONVOLUTIONS Speaker: Alfio Di Mauro, ETH Zürich, CH Authors: Alfio Di Mauro1, Arpan Prasad1, Zhikai Huang1, Matteo Spallanzani1, Francesco Conti2 and Luca Benini3 1ETH Zürich, CH; 2University of Bologna, IT; 3Università di Bologna and ETH Zürich, IT Abstract Event-based sensors are drawing increasing attention due to their high temporal resolution, low power consumption, and low bandwidth. To efficiently extract semantically meaningful information from sparse data streams produced by such sensors, we present a 4.5TOP/s/W digital accelerator capable of performing 4-bits-quantized event-based convolutional neural networks (eCNN). Compared to standard convolutional engines, our accelerator performs a number of operations proportional to the number of events contained into the input data stream, ultimately achieving a high energy-to-information processing proportionality. On the IBM-DVS-Gesture dataset, we report 80uJ/inf to 261uJ/inf, respectively, when the input activity is 1.2% and 4.9%. Our accelerator consumes 0.221pJ/SOP, to the best of our knowledge it is the lowest energy/OP reported on a digital neuromorphic engine. |
15:44 CET | 18.1.2 | LRP: PREDICTIVE OUTPUT ACTIVATION BASED ON SVD APPROACH FOR CNNS ACCELERATION Speaker: Xinxin Wu, Institute of Computing Technology, Chinese Academy of Sciences, CN Authors: Xinxin Wu, zhihua fan, tianyu liu, wenming li, xiaochun ye and dongrui fan, Institute of Computing Technology, Chinese Academy of Sciences, CN Abstract Convolutional Neural Networks (CNNs) achieve state-of-the-art performance in a wide range of applications. CNNs contain millions of parameters, and a large number of computations challenge hardware design. In this paper, we take advantage of the output activation sparsity of CNNs to reduce the execution time and energy consumption of the network. We propose Low Rank Prediction (LRP), an effective prediction method that leverages the output activation sparsity. LRP first predicts the output activation polarity of the convolutional layer based on the singular value decomposition (SVD) approach of the convolution kernel. And then it uses the predicted negative value to skip invalid computation in the original convolution. In addition, an effective accelerator, LRPPU, is proposed to take advantage of sparsity to achieve network inference acceleration. Experiments show that our LRPPU achieves 1.48× speedup and 2.02× energy reduction compared with dense networks with slight loss of accuracy. Also, it achieves on average 2.57× speedup over Eyeriss and has similar performance and less accuracy loss compared with SnaPEA. |
15:48 CET | 18.1.3 | EXPLOITING ARCHITECTURE ADVANCES FOR SPARSE SOLVERS IN CIRCUIT SIMULATION Speaker: Zhiyuan Yan, Institute of Computing Technology, Chinese Academy of Sciences, CN Authors: Zhiyuan Yan1, Biwei Xie1, Xingquan Li2 and Yungang Bao1 1Institute of Computing Technology, Chinese Academy of Sciences, CN; 2Peng Cheng Laboratoy, CN Abstract Sparse direct solvers provide vital functionality for a wide variety of scientific applications. The dominated part of the sparse direct solver, LU factorization, suffers a lot from the irregularity of sparse matrices. Meanwhile, the specific characteristics of sparse solvers in circuit simulation and the unique sparse pattern of circuit matrices provide more design spaces and also great challenges. In this paper, we propose a sparse solver named FLU and re-examine the performance of LU factorization from the perspectives of vectorization, parallelization, and data locality. To improve vectorization efficiency and data locality, FLU introduces a register-level supernode computation method by delicately manipulating data movement. With alternating multiple columns computation, FLU further reduces the off-chip memory accesses greatly. Furthermore, we implement a fine-grained elimination tree based parallelization scheme to fully exploit task-level parallelism. Compared with PARDISO and NICSLU, experimental results show that FLU achieves a speedup up to 19.51× (3.86× on average) and 2.56× (1.66× on average) on Intel Xeon respectively. |
15:52 CET | 18.1.4 | DATA-AWARE CACHE MANAGEMENT FOR GRAPH ANALYTICS Speaker: Varun Venkitaraman, Indian Institute of Technology Bombay, IN Authors: Neelam Sharma1, Varun Venkitaraman1, Newton Singh2, Vikash Kumar2, Shubham Singhania2 and Chandan Kumar Jha2 1Indian Institute of Technology, Bombay, IN; 2IIT Bombay, IN Abstract Graph analytics is powering a wide variety of applications in the domains of cybersecurity, contact tracing, and social networking. It consists of various algorithms (or workloads) that investigate the relationships between entities involved in transactions, interactions, and organizations. CPU-based graph analytics is inefficient because their cache hierarchy performs poorly owing to highly irregular memory access patterns of graph workloads. Policies managing the cache hierarchy in such systems are ignorant to the locality demands of different data types within graph workloads, and therefore are suboptimal. In this paper, we conduct an in-depth data type aware characterization of graph workloads to better understand the cache utilization of various graph data types. We find that different levels of the cache hierarchy are more sensitive to the locality demands of certain graph data types than others. Hence, we propose GRACE, a graph data-aware cache management technique, to increase cache hierarchy utilization, thereby minimizing off-chip memory traffic and enhancing performance. Our thorough evaluations show that GRACE, when augmented with a vertex reordering algorithm, outperforms a recent cache management scheme by up to 1.4x, with up to 27% reduction in expensive off-chip memory accesses. Thus, our work demonstrates that awareness of different graph data types is critical for effective cache management in graph analytics. |
15:56 CET | 18.1.5 | AGAPE: ANOMALY DETECTION WITH GENERATIVE ADVERSARIAL NETWORK FOR IMPROVED PERFORMANCE, ENERGY, AND SECURITY IN MANYCORE SYSTEMS Speaker: Ke Wang, The George Washington University, US Authors: Ke Wang1, Hao Zheng2, Yuan Li1, Jiajun Li3 and Ahmed Louri1 1The George Washington University, US; 2University of Central Florida, US; 3Beihang University, CN Abstract The security of manycore systems has become increasingly critical. In system-on-chips (SoCs), Hardware Trojans (HTs) manipulate the functionalities of the routing components to saturate the on-chip network, degrade performance, and result in the leakage of sensitive data. Existing HT detection techniques, including runtime monitoring and state-of-the-art learning-based methods, are unable to timely and accurately identify the implanted HTs, due to the increasingly dynamic and complex nature of on-chip communication behaviors. We propose AGAPE, a novel Generative Adversarial Network (GAN)-based anomaly detection and mitigation method against HTs for secured on-chip communication. AGAPE learns the distribution of the multivariate time series of a number of NoC attributes captured by on-chip sensors under both HT-free and HT-infected working conditions. The proposed GAN can learn the potential latent interactions among different runtime attributes concurrently, accurately distinguish abnormal attacked situations from normal SoC behaviors, and identify the type and location of the implanted HTs. Using the detection results, we apply the most suitable protection techniques to each type of detected HTs instead of simply isolating the entire HT-infected router, with the aim to mitigate security threats as well as reducing performance loss. Simulation results show that AGAPE enhances the HT detection accuracy by 19\%, reduces network latency and power consumption by 39\% and 30\%, respectively, as compared to state-of-the-art security designs. |
16:00 CET | 18.1.6 | Q&A SESSION Authors: Jeronimo Castrillon1 and Paula Herber2 1TU Dresden, DE; 2University of Münster, DE Abstract Questions and answers with the authors |
18.2 Memory-centric and neural network systems: architectures, tools, and profilers
Add this session to my calendar
Date: Tuesday, 22 March 2022
Time: 15:40 CET - 16:30 CET
Session chair:
Mohamed M. Sabry Aly, Nanyang Technological University, SG
Session co-chair:
Huichu Liu, Meta, Inc., US
This session focuses on two domains: neural networks (NN) and processing in memory (PIM) systems. The first paper introduces a profiler to aid in the decision-making process to migrate tasks to PIM from CPUs. The second paper provides a framework for efficient design-space exploration of NN mapping to PIM fabrics. The third paper analysis the security and resilience of spiking neural network architectures. The fourth paper investigates circuit-level techniques to enhance nonlinear operations in SRAM-based NN kernels. The session includes also a tool for content-addressable memory and a hybrid in-memory computing architecture.
Time | Label | Presentation Title Authors |
---|---|---|
15:40 CET | 18.2.1 | (Best Paper Award Candidate) PIMPROF: AN AUTOMATED PROGRAM PROFILER FOR PROCESSING-IN-MEMORY OFFLOADING DECISIONS Speaker: Yizhou Wei, University of Virginia, US Authors: Yizhou Wei1, Minxuan Zhou2, Sihang Liu1, Korakit Seemakhupt1, Tajana S. Rosing2 and Samira Khan1 1University of Virginia, US; 2UCSD, US Abstract Processing-in-memory (PIM) architectures reduce the data movement overhead by bringing computation closer to the memory. However, a key challenge is to decide which code regions of a program should be offloaded to PIM for the best performance. The goal of this work is to help programmers leverage PIM architectures by automatically profiling legacy workloads to find PIM-friendly code regions for offloading. We propose PIMProf, an automated profiling and offloading tool to determine PIM offloading regions for CPU-PIM hybrid architectures. PIMProf efficiently models the comprehensive cost related to PIM offloading and makes the offloading decision by an effective and computational-tractable algorithm. We demonstrate the effectiveness of PIMProf by evaluating the GAP graph benchmark suite and the PARSEC benchmark suite under different PIM and CPU configurations. Our evaluation shows that, compared to the CPU baseline and a PIM-only configuration, the offloading decisions by PIMProf provides 5.33x and 1.39x speedup in the GAP graph workloads, respectively; 2.22x and 1.74x speedup in the PARSEC benchmarks, respectively. |
15:44 CET | 18.2.2 | ANALYSIS OF POWER-ORIENTED FAULT INJECTION ATTACKS ON SPIKING NEURAL NETWORKS Speaker: Karthikeyan Nagarajan, Pennsylvania State University, US Authors: Karthikeyan Nagarajan1, Junde Li1, Sina Sayyah Ensan1, Mohammad Nasim Imtiaz Khan2, Sachhidh Kannan3 and Swaroop Ghosh1 1Pennsylvania State University, US; 2Intel Corporation, US; 3Ampere Computing LLC, US Abstract Spiking Neural Networks (SNN) are quickly gaining traction as a viable alternative to Deep Neural Networks (DNN). In comparison to DNNs, SNNs are more computationally powerful and provide superior energy efficiency. SNNs, while exciting at first appearance, contain security-sensitive assets (e.g., neuron threshold voltage) and vulnerabilities (e.g., sensitivity of classification accuracy to neuron threshold voltage change) that adversaries can exploit. We investigate global fault injection attacks by employing external power supplies and laser-induced local power glitches to corrupt crucial training parameters such as spike amplitude and neuron's membrane threshold potential on SNNs developed using common analog neurons. We also evaluate the impact of power-based attacks on individual SNN layers for 0% (i.e., no attack) to 100% (i.e., whole layer under attack). We investigate the impact of the attacks on digit classification tasks and find that in the worst-case scenario, classification accuracy is reduced by 85.65%. We also propose defenses e.g., a robust current driver design that is immune to power-oriented attacks, improved circuit sizing of neuron components to reduce/recover the adversarial accuracy degradation at the cost of negligible area and 25% power overhead. We also present a dummy neuron-based voltage fault injection detection system with 1% power and area overhead. |
15:48 CET | 18.2.3 | GIBBON: EFFICIENT CO-EXPLORATION OF NN MODEL AND PROCESSING-IN-MEMORY ARCHITECTURE Speaker: Hanbo Sun, Tsinghua University, CN Authors: Hanbo Sun, Chenyu Wang, Zhenhua Zhu, Xuefei Ning, Guohao Dai, Huazhong Yang and Yu Wang, Tsinghua University, CN Abstract The memristor-based Processing-In-Memory (PIM) architectures have shown great potential to boost the computing energy efficiency of Neural Networks (NNs). Existing work concentrates on hardware architecture design and algorithm-hardware co-optimization, but neglects the non-negligible impact of the correlation between NN models and PIM architectures. To ensure high accuracy and energy efficiency, it is important to co-design the NN model and PIM architecture. However, on the one hand, the co-exploration space of NN model and PIM architecture is extremely tremendous, making searching for the optimal results difficult. On the other hand, during the co-exploration process, PIM simulators pose a heavy computational burden and runtime overhead for evaluation. To address these problems, in this paper, we propose an efficient co-exploration framework for the NN model and PIM architecture, named Gibbon. In Gibbon, we propose an evolutionary search algorithm with adaptive parameter priority, which focuses on subspace of high priority parameters and alleviates the problem of vast co-design space. Besides, we design a Recurrent Neural Network (RNN) based predictor for accuracy and hardware performances. It substitutes for a large part of the PIM simulator workload and reduces the long simulation time. Experimental results show that the proposed co-exploration framework can find better NN models and PIM architectures than existing studies in only seven GPU hours (8.4∼41.3× speedup). At the same time, Gibbon can improve the accuracy of co-design results by 10.7% and reduce the energy-delay-product by 6.48× compared with existing work. |
15:52 CET | 18.2.4 | AID: ACCURACY IMPROVEMENT OF ANALOG DISCHARGE-BASED IN-SRAM MULTIPLICATION ACCELERATOR Speaker: Saeed Seyedfaraji, Vienna University of Technology (TU-Wien), AT Authors: Saeed Seyedfaraji1, Baset Mesgari2 and Semeen Rehman3 1Institute of Computer Technology, TU Wien (TU Wien), AT; 2Vienna university of technology, AT; 3TU Wien, AT Abstract This paper presents a novel technique to improve the accuracy of an energy-efficient in-memory multiplier using a standard 6T-SRAM. The state-of-the-art discharge-based in-SRAM multiplication accelerators suffer from a non-linear behavior in their bit-line (BL, BLB) due to the quadratic nature of the access transistor that leads to a poor signal-to-noise ratio (SNR). In order to achieve linearity in the BLB voltage, we propose a novel root function voltage technique on the access transistor's gate that results in accuracy improvement of on average 10.77 dB SNR compared to state-of-the-art discharge-based topologies. Our analytical methods and a circuit simulation in a 65 nm CMOS technology verify that the proposed technique consumes 0.523 pJ per computation (multiplication, accumulation, and preset) from a power supply of 1V, which is 51.18% lower compared to other state-of-the-art techniques. We have performed an extensive Monte Carlo based simulation for a 4x4 multiplication operation, and our novel technique presents less than 0.086 standard deviations for the worst-case incorrect output scenario. |
15:56 CET | 18.2.5 | Q&A SESSION Authors: Mohamed M. Sabry Aly1 and Huichu Liu2 1Nanyang Technological University, SG; 2Facebook Inc., US Abstract Questions and answers with the authors |
18.3 Persistent Memory
Add this session to my calendar
Date: Tuesday, 22 March 2022
Time: 15:40 CET - 16:30 CET
Session chair:
Joseph Friedman, UT Dallas, US
Session co-chair:
Chengmo Yang, University of Delaware, US
Switching device non-volatility inspires the opportunity for persistent memory that stores data without requiring the continuous application of energy. This session therefore explores the implications of non-volatility and persistent memory on cache architecture design. In particular, the presentations focus on locality, bandwidth, granularity, wearing, and application awareness.
Time | Label | Presentation Title Authors |
---|---|---|
15:40 CET | 18.3.1 | CHARACTERIZING AND OPTIMIZING HYBRID DRAM-PM MAIN MEMORY SYSTEM WITH APPLICATION AWARENESS Speaker: Yongfeng Wang, Sun Yat-Sen University, CN Authors: Yongfeng Wang, Yinjin Fu, Yubo Liu, Zhiguang Chen and Nong Xiao, Sun Yat-Sen University, CN Abstract Persistent memory (PM) has always been used in combination with DRAM to configure hybrid main memory systems that can obtain both the high performance of DRAM and large capacity of PM. There are critical management challenges in data placement, memory concurrency and workload scheduling for the concurrent execution of multiple application workloads. But the non-negligible performance gap between DRAM and PM makes the existing application-agnostic management strategies inefficient in reaching the full potential of hybrid memory. In this paper, we propose a series of application aware optimization strategies, including application aware data placement, adaptive thread allocation and inter-application interference avoiding, to improve the concurrent performance of different application workloads on hybrid memory. Finally, we provide the performance evaluation for our application aware solutions on real hybrid memory hardware with some comprehensive benchmark suites. Our experimental results show that the duration of multi-application concurrent execution on hybrid memory can be reduced by at most 60.7% for application aware data placement, 37.7% for adaptive thread allocation and 34.8% for workload scheduling with inter-application interference avoiding, respectively. And the additive effects of all these three optimization methods can reach 62.8% performance improvement with negligible overheads. |
15:44 CET | 18.3.2 | PATS: TAMING BANDWIDTH CONTENTION BETWEEN PERSISTENT AND DYNAMIC MEMORIES Speaker: Shucheng Wang, Huazhong University of Science and Technology, CN Authors: Shu Cheng Wang1, Qiang Cao1, Hong Jiang2 and Yuanyuan Dong3 1Huazhong University of Science and Technology, CN; 2University of Texas at Arlington, US; 3Alibaba Group, CN Abstract Emerging persistent memory (PM) with fast persistence and byte-addressability physically shares the memory channel with DRAM-based main memory. We experimentally uncover that the throughput of application accessing DRAM collapses when multiple threads access PM due to head-of-line blockage in the memory controller within CPU. To address this problem, we design a PM-Accessing Thread Scheduling (PATS) mechanism that is guided by a contention model, to adaptively tune the maximum number of contention-free concurrent PM-threads. Experimental results show that even with 14 concurrent threads accessing PM, PATS is able to allow only up to 8% decrease in the DRAM-throughput of the front-end applications (e.g., Memcached), gaining 1.5x PM-throughput speedup over the default configuration. |
15:48 CET | 18.3.3 | UNIFYING TEMPORAL AND SPATIAL LOCALITY FOR CACHE MANAGEMENT INSIDE SSDS Speaker: Jianwei Liao, Southwest University of China, CN Authors: Zhibing Sha1, Zhigang Cai1, Dong Yin2, Jianwei Liao1 and Francois Trahay3 1Southwest University of China, CN; 2Huaihua University, CN; 3Telecom Sudparis, FR Abstract To ensure better I/O performance of solid-state drivers (SSDs), a dynamic random access memory (DRAM) is commonly equipped as a cache to absorb overwrites or writes and then avoid flushing them onto underlying SSD cells. This paper focuses on the management of the small amount cache inside SSDs. First, we propose to unify both factors of temporal and spatial locality using the visibility graph technique when running user applications, for directing cache management. Next, we propose to support batch adjustment of adjacent or nearby (hot) cached data pages by referring to the connection situations in the visibility graph of all cached pages. At last, we propose to evict the buffered data pages in batches, to maximize the internal flushing parallelism of SSD devices, without worsening I/O congestion. The trace-driven simulation experiments show that our proposal can yield improvements on cache hits by more than 2.0%, and the overall I/O latency by 19.3% on average, in contrast to conventional cache schemes inside SSDs. |
15:52 CET | 18.3.4 | DWR: DIFFERENTIAL WEARING FOR READ PERFORMANCE OPTIMIZATION ON HIGH-DENSITY NAND FLASH MEMORY Speaker: Liang Shi, School of Computer Science and Technology, East China Normal University, CN Authors: Yunpeng Song1, Qiao Li2, Yina Lv1, Changlong Li1 and Liang Shi1 1School of Computer Science and Technology, East China Normal University, CN; 2City University of Hong Kong, HK Abstract With the cost reduction and density optimization, the read performance and lifetime of high-density NAND flash memory have been significantly degraded during the last decade. Previous works proposed to optimize lifetime with wear leveling and optimize read performance with reliability improvement. However, with wearing, the reliability and read performance will be degraded along with the life of the device. To solve this problem, a differential wearing scheme (DWR) is proposed to optimize the read performance. The basic idea of DWR is to partition the flash memory into two areas and wear them at different speeds. For the area with low wearing speed, read operations are scheduled for read performance optimization. For the area with high wearing speed, write operations are scheduled but designed to avoid generating bad blocks early. Through careful design and real workloads evaluation on 3D TLC NAND flash, DWR achieves encouraging read performance optimization with negligible impacts to the lifetime. |
15:56 CET | 18.3.5 | GATLB: A GRANULARITY-AWARE TLB TO SUPPORT MULTI-GRANULARITY PAGES IN HYBRID MEMORY SYSTEM Speaker: Yujie Xie, Chongqing University, CN Authors: Yujuan Tan1, Yujie Xie1, Zhulin Ma1, Zhichao Yan2, Zhichao Zhang1, Duo Liu1 and Xianzhang Chen1 1Chongqing University, CN; 2HewlettPackard Enterprise, US Abstract The parallel hybrid memory system that combines Non-volatile Memory (NVM) and DRAM can effectively expand the memory capacity. But it puts lots of pressure on TLB due to a limited TLB capacity. The superpage technology that manages pages with a large granularity (e.g., 2MB) is usually used to improve the TLB performance. However, its coarse-grained granularity conflicts with the fine-grained page migration in the hybrid memory system, resulting in serious invalid migration and page fragmentation problems. To solve these problems, we propose to maintain the coexistence of multi-granularity pages, and design a smart TLB called GATLB to support multi-granularity page management, coalesce consecutive pages and adapt to various changes in page size. Compared with the existing TLB technologies, GATLB can not only perceive page granularity to effectively expand the TLB coverage and reduce miss rate, but also provide faster address translation with a much lower overhead. Our experimental evaluations show that GATLB can expand the TLB coverage by 7.09x, reduce the TLB miss rate by 91.1%, and shorten the address translation cycle by 49.41%. |
16:00 CET | 18.3.6 | Q&A SESSION Authors: Joseph Friedman1 and Chengmo Yang2 1University of Texas at Dallas, US; 2University of Delaware, US Abstract Questions and answers with the authors |
18.4 Energy Efficient Platforms: from Autonomous Vehicles to Intermittent Computing
Add this session to my calendar
Date: Tuesday, 22 March 2022
Time: 15:40 CET - 16:30 CET
Session chair:
Domenico Balsamo, Newcastle University, GB
Session co-chair:
Bart Vermeulen, NXP Semiconductors, NL
This session focuses on energy efficient platforms. In particular, the session will present four papers. The first paper will present an efficient accelerator to enable real-time probabilistic 3D mapping at edge that can be used for autonomous machine. Continuing on autonomous vehicles, The second paper will present an FPGA solution for efficient real-time localization. Moving to more tiny devices the next two papers focuses on energy harvesting IoT devices that operate intermittently without batteries, the first will present a deep learning approach while the second will close the session presenting a FPGA-based emulation for non-volatile digital logic for intermittent computing.
Time | Label | Presentation Title Authors |
---|---|---|
15:40 CET | 18.4.1 | OMU: A PROBABILISTIC 3D OCCUPANCY MAPPING ACCELERATOR FOR REAL-TIME OCTOMAP AT THE EDGE Speaker: Tianyu Jia, Peking University, CN Authors: Tianyu Jia1, En-Yu Yang2, Yu-Shun Hsiao2, Jonathan Cruz2, David Brooks2, Gu-Yeon Wei2 and Vijay Janapa Reddi2 1Peking University, CN; 2Harvard University, US Abstract Autonomous machines (e.g., vehicles, mobile robots, drones) require sophisticated 3D mapping to perceive the dynamic environment. However, maintaining a real-time 3D map is expensive both in terms of compute and memory requirements, especially for resource-constrained edge machines. Probabilistic OctoMap is a reliable and memory-efficient 3D dense map model to represent the full environment, with dynamic voxel node pruning and expansion capacity. This paper presents the first efficient accelerator solution, i.e. OMU, to enable real-time probabilistic 3D mapping at the edge. To improve the performance, the input map voxels are updated via parallel PE units for data parallelism. Within each PE, the voxels are stored using a specially developed data structure in parallel memory banks. In addition, a pruning address manager is designed within each PE unit to reuse the pruned memory addresses. The proposed 3D mapping accelerator is implemented and evaluated using a commercial 12 nm technology. Compared to the ARM Cortex-A57 CPU in the Nvidia Jetson TX2 platform, the proposed accelerator achieves up to 62× performance and 708× energy efficiency improvement. Furthermore, the accelerator provides 63 FPS throughput, more than 2× higher than a real-time requirement, enabling real-time perception for 3D mapping. |
15:44 CET | 18.4.2 | AN FPGA OVERLAY FOR EFFICIENT REAL-TIME LOCALIZATION IN 1/10TH SCALE AUTONOMOUS VEHICLES Speaker: Paolo Burgio, University of Modena and Reggio Emilia, IT Authors: Andrea Bernardi1, Gianluca Brilli2, Alessandro Capotondi3, Andrea Marongiu4 and Paolo Burgio1 1University of Modena and Reggio Emilia, IT; 2Unimore, IT; 3Universita' di Modena e Reggio Emilia, IT; 4Università di Modena e Reggio Emilia, IT Abstract Heterogeneous systems-on-chip (HeSoC) based on reconfigurable accelerators, such as Field-Programmable Gate Arrays (FPGA), represent an appealing option to deliver the performance/Watt required by the advanced perception and localization tasks employed in the design of Autonomous Vehicles. Different from software-programmed GPUs, FPGA development involves significant hardware design effort, which in the context of HeSoCs is further complicated by the system-level integration of HW and SW blocks. High-Level Synthesis is increasingly being adopted to ease hardware IP design, allowing engineers to quickly prototype their solutions. However, automated tools still lack the required maturity to efficiently build the complex hardware/ software interaction between the host CPU and the FPGA accelerator(s). In this paper we present a fully integrated system design where a particle filter for LiDAR-based localization is efficiently deployed as FPGA logic, while the rest of the compute pipeline executes on programmable cores. This design constitutes the heart of a fully-functional 1/10th-scale racing autonomous car. In our design, accelerated IPs are controlled locally to the FPGA via a proxy core. Communication between the two and with the host CPU happens via shared memory banks also implemented as FPGA IPs. This allows for a scalable and easy-to-deploy solution both from the hardware and software viewpoint, while providing better performance and energy efficiency compared to state-ofthe- art solutions. |
15:48 CET | 18.4.3 | ENABLING FAST DEEP LEARNING ON TINY ENERGY-HARVESTING IOT DEVICES Speaker: Sahidul Islam, University of Texas at San Antonio, US Authors: Sahidul Islam1, Jieren Deng2, Shanglin Zhou2, Chen Pan3, Caiwen Ding2 and Mimi Xie1 1University of Texas at San Antonio, US; 2University of Connecticut, US; 3Texas A&M University-Corpus Christi, US Abstract Energy harvesting (EH) IoT devices that operate intermittently without batteries, coupled with advances in deep neural networks (DNNs), have opened up new opportunities for enabling sustainable smart applications. Nevertheless, implementing those computation and memory-intensive intelligent algorithms on EH devices is extremely difficult due to the challenges of limited resources and intermittent power supply that causes frequent failures. To address those challenges, this paper proposes a methodology that enables fast deep learning with low-energy accelerators for tiny energy harvesting devices. We first propose RAD, a resource-aware structured DNN training framework, which employs block circulant matrix and structured pruning to achieve high compression for leveraging the advantage of various vector operation accelerators. A DNN implementation method, ACE, is then proposed that employs low-energy accelerators to profit maximum performance with small energy consumption. Finally, we further design FLEX, the system support for intermittent computation in energy harvesting situations. Experimental results from three different DNN models demonstrate that RAD, ACE, and FLEX can enable fast and correct inference on energy harvesting devices with up to 4.26X runtime reduction, up to 7.7X energy reduction with higher accuracy over the state-of-the-art. |
15:52 CET | 18.4.4 | EMULATION OF NON-VOLATILE DIGITAL LOGIC FOR BATTERYLESS INTERMITTENT COMPUTING Speaker: Simone Ruffini, University of Trento, IT Authors: Simone Ruffini, Kasim Sinan Yildirim and Davide Brunelli, University of Trento, IT Abstract Recent engineering efforts gave rise to the emergence of devices that operate only by harvesting power from ambient energy sources, such as radiofrequency and solar energy. Due to the sporadic ambient energy sources, frequent power failures are inevitable for these devices that rely only on energy harvesting. These devices lose the values maintained in volatile hardware state elements upon a power failure. This situation leads to intermittent execution, which prevents the forward progress of computing operations. To countermeasure power failures, these devices require non-volatile memory elements, e.g., FRAM, to store the computational state. However, hardware designers can only represent volatile state elements using FPGAs in the market and current hardware description languages. As of now, there is no existing solution to fast-prototype non-volatile digital logic. This paper enables FPGA-based emulation of any custom non-volatile digital logic for intermittent computing. Therefore, our proposal can be a standard part of the current FPGA libraries provided by the vendors to design and validate future non-volatile logic designs targeting intermittent computing. |
15:56 CET | 18.4.5 | Q&A SESSION Authors: Domenico Balsamo1 and Bart Vermeulen2 1Newcastle University, GB; 2NXP Semiconductors, NL Abstract Questions and answers with the authors |
18.5 Circuit Optimization and Analysis: No Time to Lose
Add this session to my calendar
Date: Tuesday, 22 March 2022
Time: 15:40 CET - 16:30 CET
Session chair:
Eleonora Testa, Synopsys Inc., CH
Session co-chair:
Ibrahim Elfadel, Khalifa University, AE
This session presents papers that focus on timing optimization in both logic and physical synthesis. Also, it presents a new approach to flip-chip routing. The first paper proposes an algorithm that can fix minimum implant area violations in a timing-aware fashion without displacing cells, fixing the violations only by applying cell swapping. The second paper proposes a momentum-based timing-driven global placement algorithm, and the third paper describes an efficient way to parallelize dynamic timing computation on the critical path. The last paper shows a substrate routing algorithm using a novel ring routing model that handles symmetry and shielding constraints.
Time | Label | Presentation Title Authors |
---|---|---|
15:40 CET | 18.5.1 | A SYSTEMATIC REMOVAL OF MINIMUM IMPLANT AREA VIOLATIONS UNDER TIMING CONSTRAINT Speaker: Eunsol Jeong, Seoul National University, KR Authors: Eunsol Jeong, Heechun Park and Taewhan Kim, Seoul National University, KR Abstract Fixing minimum implant area (MIA) violations in post-route layout is an essential and inevitable task for the high-performance designs employing multiple threshold voltages. Unlike the conventional approaches, which have tried to locally move cells or reassign Vt (threshold voltage) of some cells in a way to resolve the MIA violations with little or no consideration of timing constraint, our proposed approach fully and systematically controls the timing budget during the removal of MIA violations. Precisely, our solution consists of three sequential steps: (1) performing critical path aware cell selection for Vt reassignment to fix the intra-row MIA violations while considering timing constraint and minimal power increments; (2) performing a theoretically optimal Vt reassignment to fix the inter-row MIA violations while satisfying both of the intra-row MIA and timing constraints; (3) refining Vt reassignment to further reduce the power consumption while meeting intra- and inter-row MIA constraints as well as timing constraint. Experiments through benchmark circuits show that our proposed approach is able to completely resolve MIA violations while ensuring no timing violation and achieving much less power increments over that by the conventional approaches. |
15:44 CET | 18.5.2 | DREAMPLACE 4.0: TIMING-DRIVEN GLOBAL PLACEMENT WITH MOMENTUM-BASED NET WEIGHTING Speaker: Peiyu Liao, The Chinese University of Hong Kong, HK Authors: Peiyu Liao1, Siting Liu1, Zhitang Chen2, Wenlong Lv3, Yibo Lin4 and Bei Yu1 1The Chinese University of Hong Kong, HK; 2Huawei Noah's Ark Lab, HK; 3Huawei Noah's Ark Lab, CN; 4Peking University, CN Abstract Timing optimization is critical to integrated circuit (IC) design closure. Existing global placement algorithms mostly focus on wirelength optimization without considering timing. In this paper, we propose a timing-driven global placement algorithm leveraging a momentum-based net weighting strategy. Besides, we improve the preconditioner to incorporate our net weighting scheme. Experimental results on ICCAD 2015 contest benchmarks demonstrate that our algorithm can significantly improve total negative slack (TNS) and meanwhile be beneficial to worse negative slack (WNS). |
15:48 CET | 18.5.3 | EVENTTIMER: FAST AND ACCURATE EVENT-BASED DYNAMIC TIMING ANALYSIS Speaker: Zuodong Zhang, Institution of Microelectronics, Peking University, CN Authors: Zuodong Zhang, Zizheng Guo, Yibo Lin, Runsheng Wang and Ru Huang, Peking University, CN Abstract As the transistor shrinks to nanoscale, the overhead of ensuring circuit functionality becomes extremely large due to the increasing timing variations. Thus, better-than-worst-case design (BTWC) has attracted more and more attention. Many of these techniques utilize dynamic timing slack (DTS) and activity information for design optimization and runtime tuning. Existing DTS computation methods are essentially a modification to the worst-case delay information, which cannot guarantee exact DTS and activity simulation, causing performance degradation in timing optimization. Therefore, in this paper, we propose EventTimer, a dynamic timing analysis engine based on event propagation to accurately compute DTS and activity information. We evaluate its accuracy and efficiency on different benchmark circuits. The experimental results show that EventTimer can achieve exact DTS computation with high efficiency. And it also proves that EventTimer has good scalability with the circuit scale and the number of CPU threads, which make it possible to be used in the application-level analysis. |
15:52 CET | 18.5.4 | PRACTICAL SUBSTRATE DESIGN CONSIDERING SYMMETRICAL AND SHIELDING ROUTES Speaker: Hung-Ming Chen, National Yang Ming Chiao Tung University, TW Authors: Hao-Yu Chi1, Yi-Hung Chen2, Hung-Ming Chen1, Chien-Nan Liu3, Yun-Chih Kuo2, Ya-Hsin Chang2 and Kuan-Hsien Ho2 1National Yang Ming Chiao Tung University, TW; 2Mediatek Inc. Taiwan, TW; 3National Yang Ming Chiao Tung University, TZ Abstract In modern package design, the flip-chip package has become mainstream because of the benefit of its high I/O pins. However, the package design is still done manually in the industry. The lack of automation tools makes the package design cycle longer due to complex routing constraints, and the frequent modification requests. In this work, we propose yet another routing framework for substrate routing. Compared with previous works, our routing algorithm generates a feasible routing solution in a few seconds for industrial design and considers important symmetry and shielding constraints that have not been handled before. Benefiting from the efficiency of our routing algorithm, the designer can get the result immediately and accommodate some modifications to reduce the cost. The experimental result shows that the routing result generated from our router is in good quality, very close to the manual design. |
15:56 CET | 18.5.5 | Q&A SESSION Authors: Eleonora Testa1 and Ibrahim (Abe) Elfadel2 1Synopsys Inc., CH; 2Khalifa University, AE Abstract Questions and answers with the authors |
18.6 Multi-Partner Projects – Session 2
Add this session to my calendar
Date: Tuesday, 22 March 2022
Time: 15:40 CET - 16:30 CET
Session chair:
Ernesto Sanchez, Politecnico di Torino, IT
Session co-chair:
Maksim Jenihhin, Tallinn UT, EE
The session is dedicated to multi-partner innovative and high-tech research projects addressing the DATE 2022 topics. The types of collaboration covered are projects funded by EU schemes (H2020, ESA, EIC, MSCA, COST, etc.), nationally- and regionally-funded projects, collaborative research projects funded by industry. Depending on the stage of the project, the papers present the novelty of the project concepts, relevance the technical objectives to the DATE community, technical highlights of the project results and insights on the lessons learnt in the project or open bits until the end of the project. In particular, this session focuses on projects tackling the challenges of artificial intelligence and deep learning, integration of hardware and software layers, and also presents a cross-sectoral collaboration for a graduate school project.
Time | Label | Presentation Title Authors |
---|---|---|
15:40 CET | 18.6.1 | NEUROTEC I: NEURO-INSPIRED ARTIFICIAL INTELLIGENCE TECHNOLOGIES FOR THE ELECTRONICS OF THE FUTURE Speaker: Christopher Bengel, Institute of Materials in Electrical Engineering II, RWTH Aachen University, DE Authors: Melvin Galicia1, Stephan Menzel2, Farhad Merchant1, Maximilian Müller3, Hsin-Yu Chen2, Qing-Tai Zhao4, Felix Cüppers2, Abdur R. Jalil4, Qi Shu5, Peter Schüffelgen4, Gregor Mussler4, Carsten Funck6, Christian Lanius7, Stefan Wiefels2, Moritz von Witzleben6, Christopher Bengel6, Nils Kopperberg6, Tobias Ziegler6, Rana Ahmad2, Alexander Krüger2, Leticia Pöhls7, Regina Dittmann2, Susanne Hoffmann-Eifert2, Vikas Rana2, Detlev Grützmacher4, Matthias Wuttig3, Dirk Wouters6, Andrei Vescan8, Tobias Gemmeke7, Joachim Knoch9, Max Lemme10, Rainer Leupers1 and Rainer Waser6 1Institute for Communication Technologies and Embedded Systems, RWTH Aachen University, DE; 2Peter-Grünberg Institut-7, Forschungszentrum Jülich GmbH, DE; 3Institute of Physics, Physics of Novel Materials, RWTH Aachen University,, DE; 4Peter-Grünberg Institut-9, Forschungszentrum Jülich GmbH, DE; 5Peter-Grünberg Institut-10, Forschungszentrum Jülich GmbH, DE; 6Institute of Materials in Electrical Engineering II, RWTH Aachen University, DE; 7Institute of Integrated Digital Systems and Circuit Design, RWTH Aachen University, DE; 8Compound Semiconductor Technology, RWTH Aachen University, DE; 9Institute of Semiconductor Electronics, RWTH Aachen University, DE; 10Chair of Electronic Devices, RWTH Aachen University, DE Abstract The field of neuromorphic computing is approaching an era of rapid adoption driven by the urgent need of a substitute for the von Neumann computing architecture. NEUROTEC I: "Neuro-inspired Artificial Intelligence Technologies for the Electronics of the Future" project is an initiative sponsored by the German Federal Ministry of Education and Research (BMBF for its initials in German), that aims to effectively advance the foundations for the utilization and exploitation of neuromorphic computing. NEUROTEC I stands at its successful "final stage" driven by the collaboration from more than 8 institutes from the Jülich Research Center and the RWTH Aachen University, as well as collaboration from several high-tech industry partners. The NEUROTEC I project considers the field interplay among materials, circuits, design and simulation tools. This paper provides an overview of the project’s overall structure and discusses the scientific achievements of its individual activities. |
15:44 CET | 18.6.2 | VEDLIOT: VERY EFFICIENT DEEP LEARNING IN IOT Speaker: Jens Hagemeyer, Bielefeld University, DE Authors: Martin Kaiser1, Rene Griessl1, Nils Kucza1, Carola Haumann1, Lennart Tigges1, Kevin Mika1, Jens Hagemeyer1, Florian Porrmann1, Ulrich Rückert1, Micha vor dem Berge2, Stefan Krupop3, Mario Porrmann4, Marco Tassemeier4, Pedro Trancoso5, Fareed Qararyah5, Stavroula Zouzoula5, Antonio Casimiro6, Alysson Bessani6, José Cecilio6, Stefan Andersson7, Oliver Brunnegard7, Olof Eriksson7, Roland Weiss8, Franz Meierhöfer8, Hans Salomonsson9, Elaheh Malekzadeh9, Daniel Ödman9, Anum Khurshid10, Pascal Felber11, Marcelo Pasin11, Valerio Schiavoni11, James Menetrey11, Karol Gugula12, Piotr Zierhoffer12, Eric Knauss13 and Hans-Martin Heyn13 1Bielefeld University, DE; 2Christmann Informationstechnik, DE; 3christmann informationstechnik, DE; 4Osnabrück University, DE; 5Chalmers University of Technology, SE; 6University of Lisbon, PT; 7VEONEER Inc., SE; 8Siemens AG, DE; 9EMBEDL AB, SE; 10Research Institutes of Sweden AB (RISE), SE; 11University of Neuchatel, CH; 12Antmicro, PL; 13Göteborg University, SE Abstract The VEDLIoT project targets the development of energy-efficient Deep Learning for distributed AIoT applications. A holistic approach is used to optimize algorithms while also dealing with safety and security challenges. The approach is based on a modular and scalable cognitive IoT hardware platform. Using modular microserver technology enables the user to configure the hardware to satisfy a wide range of applications. VEDLIoT offers a complete design flow for Next-Generation IoT devices required for collaboratively solving complex Deep Learning applications across distributed systems. The methods are tested on various use-cases ranging from Smart Home to Automotive and Industrial IoT appliances. VEDLIoT is an H2020 EU project which started in November 2020. It is currently in an intermediate stage with the first results available. |
15:48 CET | 18.6.3 | INTELLIGENT METHODS FOR TEST AND RELIABILITY Speaker: Hussam Amrouch, University of Stuttgart, DE Authors: Hussam Amrouch1, Jens Anders1, Steffen Becker1, Maik Betka1, Gerd Bleher2, Peter Domanski1, Nourhan Elhamawy1, Thomas Ertl1, Athanasios Gatzastras1, Paul R. Genssler1, Sebastian Hasler1, Martin Heinrich2, Andre van Hoorn1, Hanieh Jafarzadeh1, Ingmar Kallfass1, Florian Klemme1, Steffen Koch1, Ralf Küsters1, Andrés Lalama1, Raphael Latty2, Yiwen Liao1, Natalia Lylina1, Zahra Paria Najafi-Haghi1, Dirk Pflüger1, Ilia Polian1, Jochen Rivoir2, Matthias Sauer2, Denis Schwachhofer1, Steffen Templin2, Christian Volmer2, Stefan Wagner1, Daniel Weiskopf1, Hans-Joachim Wunderlich1, Bin Yang1 and Martin Zimmermann2 1University of Stuttgart, DE; 2Advantest Corporation, DE Abstract Test methods that can keep up with the ongoing increase in complexity of semiconductor products and their underlying technologies are an essential prerequisite for maintaining quality and safety of our daily lives and for continued success of our economies and societies. There is a huge potential how test methods can benefit from recent breakthroughs in domains such as artificial intelligence, data analytics, virtual/augmented reality, and security. The Graduate School on “Intelligent Methods for Semiconductor Test and Reliability” (GS-IMTR) at the University of Stuttgart is a large-scale, radically interdisciplinary effort to address the scientific-technological challenges in this domain. It is funded by Advantest, one of the world leaders in automatic test equipment. In this paper, we describe the overall philosophy of the Graduate School and the specific scientific questions targeted by its ten projects. |
15:52 CET | 18.6.4 | EVOLVE: TOWARDS CONVERGING BIG-DATA, HIGH-PERFORMANCE AND CLOUD-COMPUTING WORLDS Speaker: Achilleas Tzenetopoulos, National TU Athens, GR Authors: Achilleas Tzenetopoulos1, Dimosthenis Masouros1, Konstantina Koliogeorgi1, Sotirios Xydis2, Dimitrios Soudris1, Antony Chazapis3, Christos Kozanitis3, Angelos Bilas4, Christian Pinto5, Huy-Nam Nguyen6, Stelios Louloudakis7, Georgios Gardikis8, George Vamvakas8, Michelle Aubrun9, Christy Symeonidou10, Vassilis Spitadakis10, Konstantinos Xylogiannopoulos11, Bernhard Peischl11, Tahir Kalayci12, Alexander Stocker12 and Jean-Thomas Acquaviva13 1National TU Athens, GR; 2Harokopio University of Athens, GR; 3Institute of Computer Science, FORTH, GR; 4FORTH and University of Crete, GR; 5IBM Research, IE; 6Atos/BULL, FR; 7Sunlight.io, GR; 8Space Hellas S.A., GR; 9Thales Alenia Space, FR; 10Neurocom, LU; 11AVL List GmbH, AT; 12Virtual Vehicle Research GmbH, AT; 13DataDirect Networks, FR Abstract EVOLVE is a pan-European Innovation Action that aims to fully-integrate High-Performance-Computing (HPC) hardware with state-of-the-art software technologies under a unique testbed, that enables the convergence of HPC, Cloud, and Big-Data worlds and increases our ability to extract value from massive and demanding datasets. EVOLVE's advanced compute platform combines HPC-enabled capabilities, with transparent deployment in high abstraction level, and a versatile Big-Data processing stack for end-to-end workflows. Hence, domain experts have the potential to improve substantially the efficiency of existing services or introduce new models in the respective domains, e.g., automotive services, bus transportation, maritime surveillance, and others. In this paper, we describe EVOLVE's testbed and evaluate the performance of the integrated pilots from different domains. |
15:56 CET | 18.6.5 | SDK4ED: ONE-CLICK PLATFORM FOR ENERGY-AWARE, MAINTAINABLE AND DEPENDABLE APPLICATIONS Speaker: Charalampos Marantos, National TU Athens, GR Authors: Charalampos Marantos1, Miltiadis Siavvas2, Dimitrios Tsoukalas2, Christos Lamprakos3, Lazaros Papadopoulos1, Paweł Boryszko4, Katarzyna Filus4, Joanna Domańska4, Apostolos Ampatzoglou5, Alexander Chatzigeorgiou6, Erol Gelenbe4, Dionysios Kehagias2 and Dimitrios Soudris1 1National TU Athens, GR; 2Centre for Research and Technology Hellas, Thessaloniki, GR; 3School of ECE, National TU Athens, GR; 4Institute of Theoretical & Applied Computer Science, IITIS-PAN, Gliwice, PL; 5University of Macedonia, GR; 6Department of Applied Informatics, University of Macedonia, GR Abstract Developing modern secure and low-energy applications in a short time imposes new challenges and creates the need of designing new software tools to assist developers in all phases of application development. The design of such tools cannot be considered a trivial task, as they should be able to provide optimization of multiple quality requirements. In this paper, we introduce the SDK4ED platform, which incorporates advanced methods and tools for measuring and optimizing maintainability, dependability and energy. The presented solution offers a complete tool-flow for providing indicators and optimization methods with emphasis on embedded software. Effective forecasting models and decision-making solutions are also implemented to improve the quality of the software, respecting the constraints imposed on maintenance standards, energy consumption limits and security vulnerabilities. The use of the SDK4ED platform is demonstrated in a healthcare embedded application. |
16:00 CET | 18.6.6 | Q&A SESSION Authors: Ernesto Sanchez1 and Maksim Jenihhin2 1Politecnico di Torino, IT; 2Tallinn University of Technology, EE Abstract Questions and answers with the authors |
19.1 Hardware security primitives and attacks
Add this session to my calendar
Date: Tuesday, 22 March 2022
Time: 16:40 CET - 17:20 CET
Session chair:
Johanna Sepulveda, Airbus Defense and Space, DE
Session co-chair:
Jorge Guajardo, Bosch, US
The first three papers in this session discuss hardware security attacks. The first paper is on a novel verification methodology to verify the DPA security of masked hardware circuits. The second paper discusses a mechanism to activate capacitive triggers for Hardware Trojans. The third paper presents an attack based on the voltage-drop effect in an SoC composed of an FPGA and a CPU. The last paper in the session is on physically unclonable functions. More precisely, the paper proposes three new evaluation methods aimed at higher-order alphabet PUFs.
Time | Label | Presentation Title Authors |
---|---|---|
16:40 CET | 19.1.1 | (Best Paper Award Candidate) ADD-BASED SPECTRAL ANALYSIS OF PROBING SECURITY Speaker: Maria Chiara Molteni, Universita' degli Studi di Milano, IT Authors: Maria Chiara Molteni1, Vittorio Zaccaria2 and Valentina Ciriani1 1Universita' degli Studi di Milano, IT; 2Politecnico di Milano, IT Abstract In this paper, we introduce a novel exact verification methodology for non-interference properties of cryptographic circuits. The methodology exploits the Algebraic Decision Diagram representation of the Walsh spectrum to overcome the potential slow down associated with its exact verification against noninterference constraints. Benchmarked against a standard set of use cases, the methodology speeds-up 1.88x the median verification time over the existing state-of-the art tools for exact verification. |
16:44 CET | 19.1.2 | GUARANTEED ACTIVATION OF CAPACITIVE TROJAN TRIGGERS DURING POST PRODUCTION TEST VIA SUPPLY PULSING Speaker: Sule Ozev, ASU, US Authors: Bora Bilgic and Sule Ozev, ASU, US Abstract Involvement of many parties in the production of ICs makes the process more vulnerable to tampering. Consequently, IC security has become an important challenge to tackle. One of the threat models in hardware security domain is the insertion of unwanted and malicious hardware components, known as Hardware Trojans. A malicious attacker can insert a small modification into the functional circuit that can cause havoc in the field. To make the Trojan circuit stealth, typically trigger circuits are used, which not only hide the Trojan activity during post-production testing, but also randomize activation conditions, thereby making it very difficult to diagnose even after failures. Trigger mechanisms for Trojans typically delay and randomize the outcome based on a subset of internal digital signals. While there are many different ways of implementing the trigger mechanisms, charge based mechanisms have gained popularity due to their small size. In this paper, we propose a scheme to ensure that the trigger mechanisms are activated during production testing even if the conditions specified by the malicious attacker are not met. By disabling the mechanism by which the Trojan is stealth, any of the parametric techniques can be used to detect potential Trojans at production time. The proposed technique relies on supply pulsing, where we generate a potential differential between the non-active input and output of any digital gate regardless of signal pattern that the trigger mechanism is tied to. SPICE simulations show that our method works well even for the smallest Trojan trigger mechanisms. |
16:48 CET | 19.1.3 | FPGA-TO-CPU UNDERVOLTING ATTACKS Speaker: Dina Mahmoud, EPFL, CH Authors: Dina Mahmoud1, Samah Hussein1, Vincent Lenders2 and Mirjana Stojilovic1 1EPFL, CH; 2Armasuisse, CH Abstract FPGAs are proving useful and attractive for many applications, thanks to their hardware reconfigurability, low power, and high-degree of parallelism. As a result, modern embedded systems are often based on systems-on-chip (SoCs), where CPUs and FPGAs share the same die. In this paper, we demonstrate the first undervolting attack in which the FPGA acts as an aggressor while the CPU, residing on the same SoC, is the victim. We show that an adversary can use the FPGA fabric to create a significant supply voltage drop which, in turn, faults the software computation performed by the CPU. Additionally, we show that an attacker can, with an even higher success rate, execute a denial-of-service attack, without any modification of the underlying hardware or the power distribution network. Our work exposes a new electrical-level attack surface, created by tight integration of CPUs and FPGAs in modern SoCs, and incites future research on countermeasures. |
16:52 CET | 19.1.4 | BEWARE OF THE BIAS - STATISTICAL PERFORMANCE EVALUATION OF HIGHER-ORDER ALPHABET PUFS Speaker: Christoph Frisch, TU Munich, DE Authors: Christoph Frisch and Michael Pehl, TU Munich, DE Abstract Physical Unclonable Functions (PUFs) derive unpredictable and device-specific responses from uncontrollable manufacturing variations. While most of the PUFs provide only one response bit per PUF cell, deriving more bits such as a symbol from a higher-order alphabet would make PUF designs more efficient. This type of PUFs is thus suggested in some applications and subject to current research. However, only few methods are available to analyze the statistical performance of such higher-order alphabet PUFs. This work, therefore, introduces various novel schemes. Unlike previous works, the new approaches involve statistical hypothesis testing. This facilitates more refined and statistically significant statements about the PUF regarding bias effects. We utilize real-world PUF data to illustrate the capabilities of the tests. In comparison to state-of-the-art approaches, our methods indeed capture more aspects of bias. Overall, this work is a steps towards an improved quality control of higher-order alphabet PUFs. |
16:56 CET | 19.1.5 | Q&A SESSION Authors: Johanna Sepúlveda1 and Jorge Guajardo2 1Airbus Defence and Space, DE; 2Bosch Research and Technology Center, Robert Bosch LLC, US Abstract Questions and answers with the authors |
19.2 Hardware components and architectures for Machine Learning
Add this session to my calendar
Date: Tuesday, 22 March 2022
Time: 16:40 CET - 17:20 CET
Session chair:
Charles Mackin, IBM, US
Session co-chair:
Mladen Berekovic, University of Lübeck, DE
This session is dedicated to new advances in hardware compoments and architectures for ML. The first paper focuses on improving energy consumption, latency, and application throughput of neuromorhpic implementations; the second one proposes a near hybrid memory accelerator integrated close to the DRAM to improve inference; the third one is about a novel tensor processor with superior PPA metrics compared to the state of the art; the fourh paper presents a new mixed-signal architecture for implementing Quantized Neural Networks (QNNs) using flash transistors. Two IP papers complete the session: the first one presents a hybrid RRAM-SRAM system for Deep Neural Networks, while the second one is the first work to deploy a large neural network on FPGA-based neuromorphic hardware.
Time | Label | Presentation Title Authors |
---|---|---|
16:40 CET | 19.2.1 | DESIGN OF MANY-CORE BIG LITTLE μBRAINS FOR ENERGY-EFFICIENT EMBEDDED NEUROMORPHIC COMPUTING Speaker: Lakshmi Varshika Mirtinti, Drexel University, US Authors: M. Lakshmi Varshika1, Adarsha Balaji1, Federico Corradi2, Anup Das1, Jan Stuijt2 and Francky Catthoor3 1Drexel University, US; 2Imec, NL; 3Imec, BE Abstract As spiking-based deep learning inference applications are increasing in embedded systems, these systems tend to integrate neuromorphic accelerators such as μBrain to improve energy efficiency. We propose a μBrain-based scalable many-core neuromorphic hardware design to accelerate the computations of spiking deep convolutional neural networks (SDCNNs). To increase energy efficiency, cores are designed to be heterogeneous in terms of their neuron and synapse capacity (i.e., big vs. little cores), and they are interconnected using a parallel segmented bus interconnect, which leads to lower latency and energy compared to a traditional mesh-based Network-on-Chip (NoC). We propose a system software framework called SentryOS to map SDCNN inference applications to the proposed design. SentryOS consists of a compiler and a run-time manager. The compiler compiles an SDCNN application into sub-networks by exploiting the internal architecture of big and little μBrain cores. The run-time manager schedules these sub-networks onto cores and pipeline their execution to improve throughput. We evaluate the proposed big little many-core neuromorphic design and the system software framework with five commonly-used SDCNN inference applications and show that the proposed solution reduces energy (between 37% and 98%), reduces latency (between 9% and 25%), and increases application throughput (between 20% and 36%). We also show that SentryOS can be easily extended for other spiking neuromorphic accelerators such as Loihi and DYNAPs. |
16:44 CET | 19.2.2 | HYDRA: A NEAR HYBRID MEMORY ACCELERATOR FOR CNN INFERENCE Speaker: Palash Das, Indian Institute of Technology, Guwahati, IN Authors: Palash Das1, Ajay Joshi2 and Hemangee Kapoor1 1Indian Institute of Technology, Guwahati, IN; 2Boston University, US Abstract Convolutional neural network (CNN) accelerators often suffer from limited off-chip memory bandwidth and on-chip capacity constraints. One solution to this problem is near-memory or in-memory processing. Non-volatile memory, such as phase-change memory (PCM), has emerged as a promising DRAM alternative. It is also used in combination with DRAM, forming a hybrid memory. Though near-memory processing (NMP) has been used to accelerate the CNN inference, the feasibility/efficacy of NMP remained unexplored for a hybrid main memory system. Additionally, PCMs are also known to have low write endurance, and therefore, the tremendous amount of writes generated by the accelerators can drastically hamper the longevity of the PCM memory. In this work, we propose Hydra, a near hybrid memory accelerator integrated close to the DRAM to execute inference. The PCM banks store the models that are only read by the memory controller during the inference. For entire forward propagation (inference), the intermediate writes from Hydra are entirely performed to the DRAM, eliminating PCM-writes to enhance PCM lifetime. Unlike the other in-DRAM processing-based works, Hydra does not eliminate any multiplication operations by using binary or ternary neural networks, making it more suitable for the requirement of high accuracy. We also exploit inter- and intra-chip (DRAM chip) parallelism to improve the system's performance. On average, Hydra achieves around 20x performance improvements over the in-DRAM processing-based state-of-the-art works while accelerating the CNN inference. |
16:48 CET | 19.2.3 | TCX: A PROGRAMMABLE TENSOR PROCESSOR Speaker: Tailin Liang, University of Science and Technology Beijing, CN Authors: Tailin Liang1, Lei Wang1, Shaobo Shi2, John Glossner1 and Xiaotong Zhang1 1University of Science and Technology Beijing, CN; 2Hua Xia General Processor Technologies, CN Abstract Neural network processors and accelerators are domain-specific architectures deployed to solve the high computational requirements of deep learning algorithms. This paper proposes a new instruction set extension for tensor computing, TCX, with RISC-style instructions and variable length tensor extensions. It features a multi-dimensional register file, dimension registers, and fully generic tensor instructions. It can be seamlessly integrated into existing RISC ISAs and provides software compatibility for scalable hardware implementations. We present an implementation of the TCX tensor computing accelerator using an out-of-order microarchitecture implementation. The tensor accelerator is scalable in computation units from several hundred to tens of thousands. An optimized register renaming mechanism is described which allows for many physical tensor registers without requiring architectural support for large tensor register names. We describe new tensor load and store instructions that reduce bandwidth requirements based on tensor dimensions. Implementations may balance data bandwidth and computation utilization for different types of tensor computations such as element-wise, depth-wise, and matrix-multiplication. We characterize the computation precision of tensor operations to balance area, generality, and accuracy loss for several well-known neural networks. The TCX processor runs at 1 GHz and sustains 8.2 Tera operations per second using a 4096 multiplication-accumulation compute unit with up to 98.83\% MAC utilization. It consumes 12.8 square millimeters while dissipating 0.46 Watts per TOP in TSMC 28nm technology. |
16:52 CET | 19.2.4 | A FLASH-BASED CURRENT-MODE IC TO REALIZE QUANTIZED NEURAL NETWORKS Speaker: Kyler Scott, Texas A&M University, US Authors: Kyler Scott1, Cheng-Yen Lee1, Sunil Khatri1 and Sarma Vrudhula2 1Texas A&M University, US; 2Arizona State University, US Abstract This paper presents a mixed-signal architecture for implementing Quantized Neural Networks (QNNs) using flash transistors to achieve extremely high throughput with extremely low power, energy and memory requirements. Its low resource consumption makes our design especially suited for use in edge devices. The network weights are stored in-memory using flash transistors, and nodes perform operations in the analog current domain. Our design can be programmed with any QNN whose hyperparameters (the number of layers, filters, or filter size, etc) do not exceed the maximum provisioned. Once the flash devices are programmed with a trained model and the IC is given an input, our architecture performs inference with zero access to off-chip memory. We demonstrate the robustness of our design under current-mode non-linearities arising from process and voltage variations. We test validation accuracy on the ImageNet dataset, and show that our IC suffers only 0.6% and 1.0% reduction in classification accuracy for Top-1 and Top-5 outputs, respectively. Our implementation results in a ~50x reduction in latency and energy when compared to a recently published mixed-signal ASIC implementation, with similar power characteristics. Our approach provides layer partitioning and node sharing possibilities, which allow us to trade off latency, power, and area amongst each other. |
16:56 CET | 19.2.5 | Q&A SESSION Authors: Charles Mackin1 and Mladen Berekovic2 1IBM, US; 2University of Lübeck, DE Abstract Questions and answers with the authors |
19.3 NoC optimization with emerging technologies
Add this session to my calendar
Date: Tuesday, 22 March 2022
Time: 16:40 CET - 17:20 CET
Session chair:
Romain Lemaire, CEA, FR
Session co-chair:
Sebastien Le Beux, Concordia University, CA
Network-on-chip and more generally on-chip communication architectures have to be constantly improved to address new applications constraints but also to take advantage of innovation in technologies for system integration. This session proposes various approaches to illustrate these topics. First at design time a framework is proposed to estimate NoC performance using neural network. Then at execution time, two adaptive routing algorithms are detailed: one based on an optimized credit flow control between routers and the other targeting 2.5D topologies in the context of faulty links. Finally, in a prospective way, phase-change material is considered to build a whole optical NoC system. By optimizing former approaches and introducing emerging technologies, NoCs are definitely still on an innovative path.
Time | Label | Presentation Title Authors |
---|---|---|
16:40 CET | 19.3.1 | NOCEPTION: A FAST PPA PREDICTION FRAMEWORK FOR NETWORK-ON-CHIPS USING GRAPH NEURAL NETWORK Speaker: Fuping Li, Institute of Computing Technology, CN Authors: Fuping Li, Ying Wang, Cheng Liu, Huawei Li and Xiaowei Li, Institute of Computing Technology, Chinese Academy of Sciences, CN Abstract Network-on-chips (NoCs) have been viewed as a promising alternative to traditional on-chip communication architecture for the increasing number of IPs in modern chips. To support the vast design space exploration of application-specific NoC characteristics with arbitrary topologies, in this paper, we propose a fast estimation framework to predict power, performance, and area (PPA) of NoCs based on graph neural networks (GNNs). We present a general way of modeling the application and the NoC with user-defined parameters as an attributed graph, which can be learned by the GNN model. Experimental results show that on the unseen realistic applications, the proposed method achieves the accuracy of 97.36% on power estimation, 97.83% on area estimation, and improves the accuracy of the network-level and system-level performance predictor over the topology-constrained baseline method by 6.52% and 4.73% respectively. |
16:44 CET | 19.3.2 | (Best Paper Award Candidate) AN EASY-TO-IMPLEMENT AND EFFICIENT FLOW CONTROL FOR DEADLOCK-FREE ADAPTIVE ROUTING Speaker: Yi Dai, National University of Defense Technology, CN Authors: Yi Dai, Kai Lu, Sheng Ma and Junsheng Chang, National University of Defense Technology, CN Abstract Deadlock-free adaptive routing is extensively adopted in interconnection networks to improve communication bandwidth and reduce latency. However, existing deadlock-free flow control schemes either underutilize memory resources due to inefficient buffer management for simple hardware implementations, or rely on complicated coordination and synchronization mechanisms with high hardware complexity. In this work, we solve the deadlock problem from a different perspective by considering the deadlock as a lack of credit. With minor modifications of the credit accumulation procedure, our proposed full-credit flow control (FFC) ensures atomic buffer usage only based on local credit status while making full use of the buffer space. FFC can be easily integrated in the industrial router to achieve deadlock freedom with less area and power consumption, but 112% higher throughput, compared to the critical bubble scheme (CBS). We further propose a credit reservation strategy to eliminate the escape virtual channel (VC) cost for fully adaptive routing implementation. The synthesizing results demonstrate that FFC along with credit reservation (FFC-CR) can effectively reduce the area by 29% and power consumption by 26% compared with CBS. |
16:48 CET | 19.3.3 | DEFT: A DEADLOCK-FREE AND FAULT-TOLERANT ROUTING ALGORITHM FOR 2.5D CHIPLET NETWORKS Speaker: Ebadollah Taheri, Colorado State University, US Authors: Ebadollah Taheri, Sudeep Pasricha and Mahdi Nikdast, Colorado State University, US Abstract By interconnecting smaller chiplets through an interposer, 2.5D integration offers a cost-effective and high-yield solution to implement large-scale modular systems. Nevertheless, the underlying network is prone to deadlock, despite deadlock-free chiplets, and to different faults on the vertical links used for connecting the chiplets to the interposer. Unfortunately, existing fault-tolerant routing techniques proposed for 2D and 3D on-chip networks cannot be applied to chiplet networks. To address these problems, this paper presents the first deadlock-free and fault-tolerant routing algorithm, called DeFT, for 2.5D integrated chiplet systems. DeFT improves the redundancy in vertical-link selection to tolerate faults in vertical links while considering network congestion. Moreover, DeFT can tolerate different vertical-link-fault scenarios while accounting for vertical-link utilization. Compared to the state-of-the-art routing algorithms in 2.5D chiplet systems, our simulation results show that DeFT improves network reachability by up to 75% with a fault rate of up to 25% and reduces the network latency by up to 40% for multi-application execution scenarios with less than 2% area overhead. |
16:52 CET | 19.3.4 | NON-VOLATILE PHASE CHANGE MATERIAL BASED NANOPHOTONIC INTERCONNECT Speaker: Parya Zolfaghari, Concordia University, CA Authors: Parya Zolfaghari1, Joel Ortiz2, Cedric Killian2 and Sébastien Le Beux3 1Concordia University, CA; 2University of Rennes 1, Inria, CNRS/IRISA Lannion, FR; 3Department of Electrical & Computer Engineering Concordia University, CA Abstract Integrated optics is a promising technology to take advantage of light propagation for high throughput chip-scale interconnects in many core architectures. A key challenge for the deployment of nanophotonic interconnects is their high static power, which is induced by signal losses and devices calibration. To tackle this challenge, we propose to use Phase Change Material (PCM) to configure optical paths between writers and readers. The non-volatility of PCM elements and the high contrast between crystalline and amorphous phase states allow to bypass unused readers, thus reducing losses and calibration requirements. We evaluate the efficiency of the proposed PCM-based interconnects using system level simulations carried out with SNIPER manycore simulator. For this purpose, we have modified the simulator to partition clusters according to executed applications. Simulation results show that bypassing readers using PCM leads up to 52% communication power saving. |
16:56 CET | 19.3.5 | Q&A SESSION Authors: Romain Lemaire1 and Sébastien Le Beux2 1CEA-List, FR; 2Concordia University, CA Abstract Questions and answers with the authors |
19.4 Emerging devices for new computing paradigms
Add this session to my calendar
Date: Tuesday, 22 March 2022
Time: 16:40 CET - 17:20 CET
Session chair:
Georgiev Vihar, University of Glasgow, GB
Session co-chair:
Gabriele Boschetto, CNRS-LIRMM, FR
This session will cover new computing ideas and approaches. The fist paper of the session will describe the impact of reliability on the performance of photonic neural networks. Next paper will reveal how the parallel circuits approach can be applied in quantum computing domain. The session will also cover works on optical logical circuits for space and power reduction application, ternary processor and application of Ising model for solving traveling salesman problem.
Time | Label | Presentation Title Authors |
---|---|---|
16:40 CET | 19.4.1 | A RELIABILITY CONCERN ON PHOTONIC NEURAL NETWORKS Speaker: Yinyi Liu, Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, HK Authors: Yinyi LIU1, Jiaxu Zhang1, Jun Feng1, Shixi Chen1 and Jiang Xu2 1Electronic and Computer Engineering Department, The Hong Kong University of Science and Technology, HK; 2Microelectronics Thrust, Electronic and Computer Engineering Department, AI Chip Center for Emerging Smart Systems, The Hong Kong University of Science and Technology, HK Abstract Emerging integrated photonic neural networks have experimentally proved to achieve an ultra-high speedup of deep neural network training and inference in the optical domain. However, photonic devices suffer from the inherent crosstalk noise and loss, inevitably leading to reliability concerns. This paper systematically analyzes the impacts of crosstalk and loss on photonic computing systems. We propose a crosstalk-aware model for reliability estimation and find out the worst-case bounds as we increase the footprints and scales of the photonic chips. Our evaluations show that -30dB crosstalk noise can cause maximal photonic chip integration to a sharp drop by 109x. To facilitate very-large-scale photonic integration for future computing, we further propose multiple heterogeneous bijou photonic-cores to address the crosstalk-aware reliability concern. |
16:44 CET | 19.4.2 | HOW PARALLEL CIRCUIT EXECUTION CAN BE USEFUL FOR NISQ COMPUTING? Speaker: Siyuan Niu, LIRMM, University of Montpellier, FR Authors: Siyuan Niu1 and Aida Todri-Sanial2 1LIRMM, University of Montpellier, FR; 2LIRMM, University of Montpellier, CNRS, FR Abstract Quantum computing is performed on Noisy Intermediate-Scale Quantum (NISQ) hardware in the short term. Only small circuits can be executed reliably on a quantum machine due to the unavoidable noisy quantum operations on NISQ devices, leading to the under-utilization of hardware resources. With the growing demand to access quantum hardware, how to utilize it more efficiently while maintaining output fidelity is becoming a timely issue. A parallel circuit execution technique has been proposed to address this problem by executing multiple programs on hardware simultaneously. It can improve the hardware throughput and reduce the overall runtime. However, accumulative noises such as crosstalk can decrease the output fidelity in parallel workload execution. In this paper, we first give an in-depth overview of state-of-the-art parallel circuit execution methods. Second, we propose a Quantum Crosstalk-aware Parallel workload execution method (QuCP) without the overhead of crosstalk characterization. Third, we investigate the trade-off between hardware throughput and fidelity loss to explore the hardware limitation with parallel circuit execution. Finally, we apply parallel circuit execution to VQE and zero-noise extrapolation error mitigation method to showcase its various applications on advancing NISQ computing. |
16:48 CET | 19.4.3 | SPACE AND POWER REDUCTION IN BDD-BASED OPTICAL LOGIC CIRCUITS EXPLOITING DUAL PORTS Speaker: Ryosuke Matsuo, Kyoto University, JP Authors: Ryosuke Matsuo and Shin-ichi Minato, Kyoto University, JP Abstract Optical logic circuits based on integrated nanophotonics have attracted significant interest due to their ultra-high-speed operation. A synthesis method based on the Binary Decision Diagram (BDD) has been studied, as BDD-based optical logic circuits can take advantage of the speed of light. However, a fundamental disadvantage of BDD-based optical logic circuits is a large number of splitters, which results in large power consumption. In BDD-based circuits a dual port of each logic gate is not used. We propose a method for eliminating a splitter exploiting this dual port. We define a BDD node corresponding to a dual port as a dual port node (DP node) and call the proposed method DP node sharing. We demonstrated that DP node sharing significantly reduces the power consumption and to a lesser extent circuit size without increasing delay. We conducted an experiment involving 10-input logic functions obtained by applying an LUT technology mapper to an ISCSA'85 C7552 benchmark circuit to evaluate our dual node sharing. The experimental results demonstrated that DP node sharing reduces the power consumption by two orders of magnitude of circuit that consume a large amount of power. |
16:52 CET | 19.4.4 | DESIGN AND EVALUATION FRAMEWORKS FOR ADVANCED RISC-BASED TERNARY PROCESSOR Speaker: Dongyun Kam, Pohang University of Science and Technology, KR Authors: Dongyun Kam, Jung Gyu Min, Jongho Yoon, Sunmean Kim, Seokhyeong Kang and Youngjoo Lee, Pohang University of Science and Technology, KR Abstract In this paper, we introduce the design and verification frameworks for developing a fully-functional emerging ternary processor. Based on the existing compiling environments for binary processors, for the given ternary instructions, the software-level framework provides an efficient way to convert the given programs to the ternary assembly codes. We also present a hardware-level framework to rapidly evaluate the performance of a ternary processor implemented in arbitrary design technology. As a case study, the fully-functional 9-trit advanced RISC-based ternary (ART-9) core is newly developed by using the proposed frameworks. Utilizing 24 custom ternary instructions, the 5-stage ART-9 prototype architecture is successfully verified by a number of test programs including dhrystone benchmark in a ternary domain, achieving the processing efficiency of 57.8 DMIPS/W and 3.06 x 10^6 DMIPS/W in the FPGA-level ternary-logic emulations and the emerging CNTFET ternary gates, respectively. |
16:56 CET | 19.4.5 | Q&A SESSION Authors: Vihar Georgiev1 and Gabriele Boschetto2 1University of Glasgow, GB; 2CNRS-LIRMM, FR Abstract Questions and answers with the authors |
19.5 Dealing with Correct Design and Robustness analysis for Complex Systems, MPSoCs and Circuits
Add this session to my calendar
Date: Tuesday, 22 March 2022
Time: 16:40 CET - 17:20 CET
Session chair:
Chung-Wei Lin, National Taiwan University, TW
Session co-chair:
Dionisios N. Pnevmatikatos, NTUA, GR
This session contains two parts; the first part is dedicated to complex systems. The presence of many different subsystems is an important concern in today's complex systems. The first paper addresses the robustness of deep neural networks. The second paper faces the problem of planning communications in wireless networks with energy harvesting. The second part presents two industrial briefs on different levels of chip design. One is technology-oriented and focuses on transistor design, and the other deals with statistical analysis for robustness of MPSoCs. Both briefs provide detailed results on performance or energy.
Time | Label | Presentation Title Authors |
---|---|---|
16:40 CET | 19.5.1 | REVISITING PASS-TRANSISTOR LOGIC STYLES IN A 12NM FINFET TECHNOLOGY NODE Speaker: Jan Lappas, TU Kaiserslautern, DE Authors: Jan Lappas1, André Chinazzo1, Christian Weis2, Chenyang Xia3, Zhihang Wu3, Leibin Ni3 and Norbert Wehn2 1TU Kaiserslautern, DE; 2University of Kaiserslautern, DE; 3Huawei Technologies Co., Ltd., CN Abstract With the slow-down of Moore’s law and the increasing requirements on energy efficiency, alternative logic styles compared to complementary static CMOS have to be revisited for digital circuit implementations. Pass Transistor Logic (PTL) gained much attention in the ‘90s, however, only a limited number of recent investigations and publications regarding PTL exist that use advanced technology nodes. This paper compares key performance metrics of 22 different PTL based 1-bit full adder designs to a complementary static CMOS logic reference, using a recent 12nm FinFET technology. The figures of merit are the propagation delay, the energy consumption, and the energy-delay-product (EDP). Our investigations show that PTL based adder circuits can have an up to 49% decreased delay and a 48% and 63% reduced energy consumption and EDP, respectively, compared to a state-of-the-art complementary CMOS logic reference. In addition, we analyzed the impact of PVT variations on the delay for selected PTL full adder designs. |
16:44 CET | 19.5.2 | SAFESU-2: A SAFE STATISTICS UNIT FOR SPACE MPSOCS Speaker: Guillem Cabo, Barcelona Supercomputing Center, ES Authors: Guillem Cabo1, Sergi Alcaide2, Carles Hernandez3, Pedro Benedicte1, Francisco Bas1, Fabio Mazzocchetti1 and Jaume Abella1 1Barcelona Supercomputing Center, ES; 2Universitat Politècnica de Catalunya - Barcelona Supercomputing Center, ES; 3Universitat Politecnica de Valencia, ES Abstract Advanced statistics units (SUs) have been proven effective for the verification, validation and implementation of safety measures as part of safety-related MPSoCs. This is the case, for instance, of the RISC-V MPSoC by Cobham Gaisler based on NOEL-V cores that will become commercially ready by the end of 2022. However, while those SUs support safety in the rest of the SoC, they must be built to be safe to be deployed in real products. This paper presents the SafeSU-2, the safety-compliant version of the SafeSU. In particular, we develop the safety concept of the SafeSU for relevant fault models, and implement fault detection and fault tolerance features needed to make it compliant with the requirements of safety-related devices in general, and of space MPSoCs in particular. |
16:48 CET | 19.5.3 | (Best Paper Award Candidate) EFFICIENT GLOBAL ROBUSTNESS CERTIFICATION OF NEURAL NETWORKS VIA INTERLEAVING TWIN-NETWORK ENCODING Speaker: Zhilu Wang, Northwestern University, US Authors: Zhilu Wang1, Chao Huang2 and Qi Zhu1 1Northwestern University, US; 2University of Liverpool, Northwestern University, GB Abstract The robustness of deep neural networks has received significant interest recently, especially when being deployed in safety-critical systems, as it is important to analyze how sensitive the model output is under input perturbations. While most previous works focused on the local robustness property around an input sample, the studies of the global robustness property, which bounds the maximum output change under perturbations over the entire input space, are still lacking. In this work, we formulate the global robustness certification for neural networks with ReLU activation functions as a mixed-integer linear programming (MILP) problem, and present an efficient approach to address it. Our approach includes a novel interleaving twin-network encoding scheme, where two copies of the neural network are encoded side-by-side with extra interleaving dependencies added between them, and an over-approximation algorithm leveraging relaxation and refinement techniques to reduce complexity. Experiments demonstrate the timing efficiency of our work when compared with previous global robustness certification methods and the tightness of our over-approximation. A case study of closed-loop control safety verification is conducted, and demonstrates the importance and practicality of our approach for certifying the global robustness of neural networks in safety-critical systems. |
16:52 CET | 19.5.4 | OPPORTUNISTIC COMMUNICATION WITH LATENCY GUARANTEES FOR INTERMITTENTLY-POWERED DEVICES Speaker: Kacper Wardega, Boston University, US Authors: Kacper Wardega1, Wenchao Li1, Hyoseung Kim2, Yawen Wu3, Zhenge Jia3 and Jingtong Hu3 1Boston University, US; 2University of California, Riverside, US; 3University of Pittsburgh, US Abstract Energy-harvesting wireless sensor nodes have found widespread adoption due to their low cost and small form factor. However, uncertainty in the available power supply introduces significant challenges in engineering communications between intermittently- powered nodes. We propose a constraint-based model for energy harvests that together with a hardware model can be used to enable safe, opportunistic communication with worst-case latency guarantees. We show that greedy approaches that attempt communication whenever energy is available lead to prolonged latencies in real-world environments. Our approach offers bounded worst-case latency while providing a performance improvement over a conservative, offline approach planned around the worst-case energy harvest. |
16:56 CET | 19.5.5 | Q&A SESSION Authors: Chung-Wei Lin1 and Dionisios Pnevmatikatos2 1National Taiwan University, TW; 2School of ECE, National TU Athens & FORTH-ICS, GR Abstract Questions and answers with the authors |
20.1 Panel: The Good, the Bad and the Trendy of Multi-Partner Research Projects in Europe
Add this session to my calendar
Date: Tuesday, 22 March 2022
Time: 18:00 CET - 20:30 CET
Session chair:
Lorena Anghel, Grenoble INP, FR
Session co-chair:
Maksim Jenihhin, Tallinn University of Technology, EE
Panellists:
Yves Gigase, KDT Joint Undertaking, BE
Anton Chichkov, KDT Joint Undertaking, BE
Daniel Watzenig, Virtual Vehicle Research GmbH, AT
Said Hamdioui, Delft University of Technology, NL
Peter Hofmann, Deutsche Telekom Security, DE
Christoph Grimm, TU Kaiserslautern, DE
Dirk Pflueger, University of Stuttgart, DE
The panel establishes an open discussion of opportunities and approaches to collaborative research and innovation in Europe. In addition, it foresees an invited talk by Yves Gigase entitled “KDT JU and the Chips Act: Opportunities for the DATE Community”. The panel speakers include representatives of the European Commission and distinguished experts in multi-partner research, notably representatives of the projects CARAMEL, GENIAL! and GS-IMTR. The on-line live debate will address the balance between blue-sky and applied research in Europe, next killer trends, protection of EU interests, exacerbated by COVID-19 and a number of other exciting questions.
IP.3_1 Interactive presentations
Add this session to my calendar
Date: Wednesday, 23 March 2022
Time: 11:30 CET - 12:15 CET
Interactive Presentations (IPs) run simultaneously during a 45-minute slot. Authors of IPs are available to present their work and answer to questions throughout the session.
Label | Presentation Title Authors |
---|---|
IP.3_1.1 | REDMULE: A COMPACT FP16 MATRIX-MULTIPLICATION ACCELERATOR FOR ADAPTIVE DEEP LEARNING ON RISC-V-BASED ULTRA-LOW-POWER SOCS Speaker: Yvan Tortorella, University of Bologna, IT Authors: Yvan Tortorella1, Luca Bertaccini2, Davide Rossi3, Luca Benini4 and Francesco Conti1 1University of Bologna, IT; 2ETH Zürich, CH; 3University Of Bologna, IT; 4Università di Bologna and ETH Zürich, IT Abstract The fast proliferation of extreme-edge applications using Deep Learning (DL) based algorithms required dedicated hardware to satisfy extreme-edge applications’ latency, throughput, and precision requirements. While inference is achievable in practical cases, online finetuning and adaptation of general DL models are still highly challenging. One of the key stumbling stones is the need for parallel floating-point operations, which are considered unaffordable on sub-100mW extreme-edge SoCs. We tackle this problem with RedMulE (Reduced-precision matrix Multiplication Engine), a parametric low-power hardware accelerator for FP16 matrix multiplications - the main kernel of DL training and inference - conceived for tight integration within a cluster of tiny RISC-V cores based on the PULP (Parallel Ultra-Low-Power) architecture. In 22nm technology, a 32-FMA RedMulE instance occupies just 0.07mm^2 (14% of an 8-core RISC-V cluster) and achieves up to 666MHz maximum operating frequency, for a throughput of 31.6MAC/cycle (98.8% utilization). We reach a cluster-level power consumption of 43.5mW and a full-cluster energy efficiency of 68816-bitGFLOPS/W. Overall, RedMulE features up to 4.65× higher energy efficiency and 22× speedup over SW execution on 8 RISC-V cores. |
IP.3_1.2 | INCREASING CELLULAR NETWORK ENERGY EFFICIENCY FOR RAILWAY CORRIDORS Speaker: Adrian Schumacher, Swisscom (Switzerland) Ltd., CH Authors: Adrian Schumacher1, Ruben Merz1 and Andreas Burg2 1Swisscom (Switzerland) Ltd., CH; 2EPFL-TCL, CH Abstract Modern trains act as Faraday cages making it challenging to provide high cellular data capacities to passengers. A solution is the deployment of linear cells along railway tracks, forming a cellular corridor. To provide a sufficiently high data capacity, many cell sites need to be installed at regular distances. However, such cellular corridors with high power sites in short distance intervals are not sustainable due to the infrastructure power consumption. To render railway connectivity more sustainable, we propose to deploy fewer high-power radio units with intermediate low-power support repeater nodes. We show that these repeaters consume only 5% of the energy of a regular cell site and help to maintain the same data capacity in the trains. In a further step, we introduce a sleep mode for the repeater nodes that enables autonomous solar powering and even eases installation because no cables to the relays are needed. |
IP.3_1.3 | HEALTH MONITORING OF MILLING TOOLS UNDER DISTINCT OPERATING CONDITIONS BY A DEEP CONVOLUTIONAL NEURAL NETWORK MODEL Speaker: Priscile Suawa, Brandenburg TU, Cottbus–Senftenberg, DE Authors: Priscile Suawa and Michael Hübner, Brandenburg TU Cottbus, DE Abstract One of the most popular manufacturing techniques is milling. It can be used to make a variety of geometric components, such as flat grooves, surfaces, etc. The condition of the milling tool has a major impact on the quality of milling processes. Hence the importance of follow-up. When working on monitoring solutions, it is crucial to take into account different operating variables, such as rotational speed, especially in real-world experiences. This work addresses the topic of predictive maintenance by exploiting the fusion of sensor data and the artificial intelligence-based analysis of signals measured by sensors. With a set of data such as vibration and sound reflection from the sensors, we focus on finding solutions for the task of detecting the health condition of machines. A Deep Convolutional Neural Network (DCNN) model is provided with fusion at the sensor data level to detect five consecutive health states of a milling tool; From a healthier state to a state of degradation. In addition, a demonstrator is built with Simulink to simulate and visualize the detection process. To examine the capacity of our model, the signal data was processed individually and subsequently merged. Experiments were carried out on three sets of data recorded during a real milling process. Results using the proposed DCNN architecture with raw data have reached an accuracy of more than 94\% for all data sets. |
IP.3_2 Interactive presentations
Add this session to my calendar
Date: Wednesday, 23 March 2022
Time: 11:30 CET - 12:15 CET
Interactive Presentations (IPs) run simultaneously during a 45-minute slot. Authors of IPs are available to present their work and answer to questions throughout the session.
Label | Presentation Title Authors |
---|---|
IP.3_2.1 | GRADIENT-BASED BIT ENCODING OPTIMIZATION FOR NOISE-ROBUST BINARY MEMRISTIVE CROSSBAR Speaker: Youngeun Kim, Yale University, US Authors: Youngeun Kim1, Hyunsoo Kim2, Seijoon Kim3, Sang Joon Kim2 and Priyadarshini Panda1 1Yale University, US; 2Samsung Advanced Institute of Technology, KR; 3Seoul National University, KR Abstract Binary memristive crossbars have gained huge attention as an energy-efficient deep learning hardware accelerator. Nonetheless, they suffer from various noises due to the analog nature of the crossbars. To overcome such limitations, most previous works train weight parameters with noise data obtained from a crossbar. These methods are, however, ineffective because it is difficult to collect noise data in large-volume manufacturing environment where each crossbar has a large device/circuit level variation. Moreover, we argue that there is still room for improvement even though these methods somewhat improve accuracy. This paper explores a new perspective on mitigating crossbar noise in a more generalized way by manipulating input binary bit encoding rather than training the weight of networks with respect to noise data. We first mathematically show that the noise decreases as the number of binary bit encoding pulses increases when representing the same amount of information. In addition, we propose Gradient-based Bit Encoding Optimization (GBO) which optimizes a different number of pulses at each layer, based on our in-depth analysis that each layer has a different level of noise sensitivity. The proposed heterogeneous layer-wise bit encoding scheme achieves high noise robustness with low computational cost. Our experimental results on public benchmark datasets show that GBO improves the classification accuracy by ~ 5-40% in severe noise scenarios. |
IP.3_2.2 | TAS: TERNARIZED NEURAL ARCHITECTURE SEARCH FOR RESOURCE-CONSTRAINED EDGE DEVICES Speaker: Mohammad Loni, MDH, SE Authors: Mohammad Loni1, Hamid Mousavi2, Mohammad Riazati2, Masoud Daneshtalab2 and Mikael Sjodin3 1Mälardalen University, SE; 2MDH, SE; 3Mälardalen Real-Time Research Centre, SE Abstract Ternary Neural Networks (TNNs) compress network weights and activation functions into 2-bit representation resulting in remarkable network compression and energy efficiency. However, there remains a significant gap in accuracy between TNNs and full-precision counterparts. Recent advances in Neural Architectures Search (NAS) promise opportunities in automated optimization for various deep learning tasks. Unfortunately, this area is unexplored for optimizing TNNs. This paper proposes TAS, a framework that drastically reduces the accuracy gap between TNNs and their full-precision counterparts by integrating quantization into the network design. We experienced that directly applying NAS to the ternary domain provides accuracy degradation as the search settings are customized for full-precision networks. To address this problem, we propose (i) a new cell template for ternary networks with maximum gradient propagation; and (ii) a novel learnable quantizer that adaptively relaxes the ternarization mechanism from the distribution of the weights and activation functions. Experimental results reveal that TAS delivers 2.64% higher accuracy and ≈2.8x memory saving over competing methods with the same bit-width resolution on the CIFAR-10 dataset. These results suggest that TAS is an effective method that paves the way for the efficient design of the next generation of quantized neural networks. |
IP.3_2.3 | EXAMINING AND MITIGATING THE IMPACT OF CROSSBAR NON-IDEALITIES FOR ACCURATE IMPLEMENTATION OF SPARSE DEEP NEURAL NETWORKS Speaker: Abhiroop Bhattacharjee, Yale University, US Authors: Abhiroop Bhattacharjee1, Lakshya Bhatnagar2 and Priyadarshini Panda1 1Yale University, US; 2IIT Delhi, IN Abstract Recently several structured pruning techniques have been introduced for energy-efficient implementation of Deep Neural Networks (DNNs) with lesser number of crossbars. Although, these techniques have claimed to preserve the accuracy of the sparse DNNs on crossbars, none have studied the impact of the inexorable crossbar non-idealities on the actual performance of the pruned networks. To this end, we perform a comprehensive study to show how highly sparse DNNs, that result in significant crossbar-compression-rate, can lead to severe accuracy losses compared to unpruned DNNs mapped onto non-ideal crossbars. We perform experiments with multiple structured-pruning approaches (such as, C/F pruning, XCS and XRS) on VGG11 and VGG16 DNNs with benchmark datasets (CIFAR10 and CIFAR100). We propose two mitigation approaches - Crossbar-column rearrangement and Weight-Constrained-Training (WCT) - that can be integrated with the crossbar-mapping of the sparse DNNs to minimize accuracy losses incurred by the pruned models. These help in mitigating non-idealities by increasing the proportion of low conductance synapses on crossbars, thereby improving their computational accuracies. |
IP.3_3 Interactive presentations
Add this session to my calendar
Date: Wednesday, 23 March 2022
Time: 11:30 CET - 12:15 CET
Interactive Presentations (IPs) run simultaneously during a 45-minute slot. Authors of IPs are available to present their work and answer to questions throughout the session.
Label | Presentation Title Authors |
---|---|
IP.3_3.1 | CROSS-LEVEL PROCESSOR VERIFICATION VIA ENDLESS RANDOMIZED INSTRUCTION STREAM GENERATION WITH COVERAGE-GUIDED AGING Speaker: Niklas Bruns, University of Bremen, DE Authors: Niklas Bruns1, Vladimir Herdt2, Eyck Jentzsch3 and Rolf Drechsler4 1University of Bremen, DE; 2DFKI, DE; 3MINRES Technologies GmbH, DE; 4University of Bremen/DFKI, DE Abstract We propose a novel cross-level verification approach for processor verification at the Register-Transfer Level (RTL). The foundation is a randomized coverage-guided instruction stream generator that produces one endless and unrestricted instruction stream that evolves dynamically at runtime. We leverage an Instruction Set Simulator(ISS) as a reference model in a tight co-simulation setting. Coverage information is continuously updated based on the execution state of the ISS and we employ Coverage-guided Aging to smooth out the coverage distribution of the randomized instruction stream over the time. Our case study with an industrial pipelined 32 bit RISC-V processor demonstrate the effectiveness of our approach. |
IP.3_3.2 | HARDWARE ACCELERATION OF EXPLAINABLE MACHINE LEARNING Speaker: Prabhat Mishra, University of Florida, US Authors: Zhixin Pan and Prabhat Mishra, University of Florida, US Abstract Machine learning (ML) is successful in achieving human-level performance in various fields. However, it lacks the ability to explain an outcome due to its black-box nature. While recent efforts on explainable ML has received significant attention, the existing solutions are not applicable in real-time systems since they map interpretability as an optimization problem, which leads to numerous iterations of time-consuming complex computations. To make matters worse, existing implementations are not amenable for hardware-based acceleration. In this paper, we propose an efficient framework to enable acceleration of explainable ML procedure with hardware accelerators. We explore the effectiveness of both Tensor Processing Unit (TPU) and Graphics Processing Unit (GPU) based architectures in accelerating explainable ML. Specifically, this paper makes three important contributions. (1) To the best of our knowledge, our proposed work is the first attempt in enabling hardware acceleration of explainable ML. (2) Our proposed solution exploits the synergy between matrix convolution and Fourier transform, and therefore, it takes full advantage of TPU’s inherent ability in accelerating matrix computations. (3) Our proposed approach can lead to real-time outcome interpretation. Extensive experimental evaluation demonstrates that proposed approach deployed on TPU can provide drastic improvement in interpretation time (39x on average) as well as energy efficiency (69x on average) compared to existing acceleration techniques. |
IP.3_3.3 | FAST SIMULATION OF FUTURE 128-BIT ARCHITECTURES Speaker: Frédéric Pétrot, University Grenoble Alpes, Grenoble INP, FR Authors: Fabien Portas1 and Frédéric Pétrot2 1TIMA lab, University Grenoble Alpes, CNRS, Grenoble-INP, FR; 2TIMA Lab, Université Grenoble Alpes, FR Abstract Whether 128-bit architectures will some day hit the market or not is an open question. There is however a trend towards that direction: virtual addresses grew from 34 to 48 bits in 1999 and then to 57 bits in 2019. The impact of a virtually infinite addressable space on software is hard to predict, but it will most likely be major. Simulation tools are therefore needed to support research and experimentation for tooling and software. In this paper, we present the implementation of the 128-bit extension of the RISC-V architecture in the QEMU functional simulator and report first performance evaluations. On our limited set of programs, simulation is slowed down by a factor of at worst 5 compared to 64-bit simulation, making the tool still usable for executing large software codes. |
IP.3_4 Interactive presentations
Add this session to my calendar
Date: Wednesday, 23 March 2022
Time: 11:30 CET - 12:15 CET
Interactive Presentations (IPs) run simultaneously during a 45-minute slot. Authors of IPs are available to present their work and answer to questions throughout the session.
Label | Presentation Title Authors |
---|---|
IP.3_4.1 | A GENERATIVE AI FOR HETEROGENEOUS NETWORK-ON-CHIP DESIGN SPACE PRUNING Speaker: Maxime France-Pillois, LIRMM, FR Authors: Maxime Mirka1, Maxime France-Pillois1, Gilles Sassatelli2 and Abdoulaye Gamatie3 1LIRMM CNRS / University of Montpellier, FR; 2LIRMM CNRS / University of Montpellier 2, FR; 3CNRS LIRMM / University of Montpellier, FR Abstract Often suffering from under-optimization, Networks-on-Chip (NoCs) heavily impact the efficiency of domain-specific Systems-on-Chip. To cope with this issue, heterogeneous NoCs are promising alternatives. Nevertheless, the design of optimized NoCs satisfying multiple performance objectives, e.g. throughput, power and area, is extremely challenging and requires significant expertise. While some approaches have been proposed to deal with the design space of NoCs, most fail to meet some expectations such as tractable exploration time and handling of multi-objective optimization. In this paper, we propose an approach based on generative artificial intelligence to help pruning complex design spaces for heterogeneous NoCs, according to configurable performance objectives. This is made possible by the ability of Generative Adversarial Networks to learn and generate relevant design candidates for the target NoCs. The speed and flexibility of our solution enable a fast generation of optimized NoCs that fit users' expectations. Through some experiments, we show how to obtain competitive NoC designs reducing the power consumption with no communication performance or area penalty compared to a given conventional NoC design. |
IP.3_4.2 | SPARROW: A LOW-COST HARDWARE/SOFTWARE CO-DESIGNED SIMD MICROARCHITECTURE FOR AI OPERATIONS IN SPACE PROCESSORS Speaker: Marc Solé Bonet, Universitat Politècnica de Catalunya - Barcelona Supercomputing Center, ES Authors: Marc Solé Bonet and Leonidas Kosmidis, Universitat Politècnica de Catalunya - Barcelona Supercomputing Center, ES Abstract Recently there is an increasing interest in the use of artificial intelligence for on-board processing as indicated by the latest space missions, which cannot be satisfied by existing low-performance space-qualified processors. Although COTS AI accelerators can provide the required performance, they are not designed to meet space requirements. In this work, we co-design a low-cost SIMD micro-architecture integrated in a space qualified processor, which can significantly increase its performance. Our solution has no impact on the processor's 100 MHz frequency and consumes minimal area thanks to its innovative design compared to conventional vector micro-architectures. For the minimum configuration of our baseline space processor, our results indicate a performance boost of up to 9.3x for commonly used AI-related and image processing algorithms and 5.5x faster for a complex, space-relevant inference application with just 30% area increase. |
IP.3_4.3 | A PLUGGABLE VECTOR UNIT FOR RISC-V VECTOR EXTENSION Speaker: Vincenzo Maisto, Hensoldt Cyber GmbH, and University of Naples Federico II, IT Authors: Vincenzo Maisto1 and Alessandro Cilardo2 1University of Naples Federico II and Hensoldt Cyber GmbH, IT; 2University of Naples Federico II, IT Abstract Vector extensions have become increasingly important for accelerating data-parallel applications in areas like multimedia, data-streaming, and Machine Learning. This interactive presentation introduces a microarchitectural design of a vector unit compliant with the RISC-V vector extension v1.0. While we targeted a specific core for demonstration, CVA6, our architecture is designed so as to ensure extensibility, maintainability, and re-usability in other cores. Furthermore, as a distinctive feature, we support speculative execution and precise vector traps. The paper provides an overview of the main motivation, design choices, and implementation details, followed by a qualitative and quantitative discussion of the results collected from the synthesis of the extended CVA6 RISC-V core. |
IP.3_5 Interactive presentations
Add this session to my calendar
Date: Wednesday, 23 March 2022
Time: 11:30 CET - 12:15 CET
Interactive Presentations (IPs) run simultaneously during a 45-minute slot. Authors of IPs are available to present their work and answer to questions throughout the session.
Label | Presentation Title Authors |
---|---|
IP.3_5.1 | ROBUST RECONFIGURABLE SCAN NETWORKS Speaker: Natalia Lylina, University of Stuttgart, DE Authors: Natalia Lylina, Chih-Hao Wang and Hans-Joachim Wunderlich, University of Stuttgart, DE Abstract Reconfigurable Scan Networks (RSNs) access the evaluation results from embedded instruments and control their operation throughout the device lifetime. At the same time, a single fault in an RSN may dramatically reduce the accessibility of the instruments. During post-silicon validation, it may prevent extracting the complete data from a device. During online operation, the inaccessibility of runtime-critical instruments via a defect RSN may eventually result in a system failure. This paper addresses both scenarios above by presenting robust RSNs. We show that by making a small number of carefully selected spots in RSNs more robust, the entire access mechanism becomes significantly more reliable. A flexible cost function assesses the importance of specific control primitives for the overall accessibility of the instruments. Following the cost function, a minimized number of spots is hardened against permanent faults. All the critical instruments as well as most of the remaining instruments are accessible through the resulting RSNs even in the presence of defects. In contrast to state-of-the-art fault-tolerant RSNs, the presented scheme does not change the RSN topology and needs less hardware overhead. Selective hardening is formulated as a multi-objective optimization problem and solved by using an evolutionary algorithm. The experimental results validate the efficiency and the scalability of the approach. |
IP.3_5.2 | SYNCLOCK: RF TRANSCEIVER SECURITY USING SYNCHRONIZATION LOCKING Speaker: Alan Rodrigo Díaz Rizo, Sorbonne University, CNRS, LIP6, FR Authors: Alan Rodrigo Díaz Rizo, Hassan Aboushady and Haralampos-G. Stratigopoulos, Sorbonne Université, CNRS, LIP6, FR Abstract We present an anti-piracy locking-based design methodology for RF transceivers, called SyncLock. SyncLock acts on the synchronization of the transmitter with the receiver. If a key other than the secret one is applied the synchronization and, thereby, the communication fails. SyncLock is implemented using a novel locking concept. A hard-coded error is hidden into the design while the unlocking, i.e., the error correction, takes place at another part of the design upon application of the secret key. SyncLock presents several advantages. It is generally applicable, incorrect keys result in denial-of-service, it incurs no performance penalty and minimum overheads, and it offers maximum security thwarting all known counter-attacks. We demonstrate SyncLock with hardware measurements. |
IP.3_5.3 | DEEP REINFORCEMENT LEARNING FOR ANALOG CIRCUIT STRUCTURE SYNTHESIS Speaker: Zhenxin Zhao, Memorial University of Newfoundland, CA Authors: Zhenxin Zhao and Lihong Zhang, Memorial University of Newfoundland, CA Abstract This paper presents a novel deep-reinforcement-learning-based method for analog circuit structure synthesis. It behaves like a designer, who learns from trials, derives design knowledge and experience, and evolves gradually to eventually figure out a way to construct circuit structures that can meet the given design specifications. Necessary design rules are defined and applied to set up the specialized environment of reinforcement learning in order to reasonably construct circuit structures. The produced circuit structures are then verified by the simulation-in-loop sizing. In addition, hash table and symbolic analysis techniques are employed to significantly promote the evaluation efficiency. Our experimental results demonstrate the sound efficiency, strong reliability, and wide applicability of the proposed method. |
IP.3_6 Interactive presentations
Add this session to my calendar
Date: Wednesday, 23 March 2022
Time: 11:30 CET - 12:15 CET
Interactive Presentations (IPs) run simultaneously during a 45-minute slot. Authors of IPs are available to present their work and answer to questions throughout the session.
Label | Presentation Title Authors |
---|---|
IP.3_6.1 | COMPATIBILITY CHECKING FOR AUTONOMOUS LANE-CHANGING ASSISTANCE SYSTEMS Speaker: Chung-Wei Lin, National Taiwan University, TW Authors: Po-Yu Huang1, Kai-Wei Liu1, Zong-Lun Li1, Sanggu Park2, Edward Andert2, Chung-Wei Lin1 and Aviral Shrivastava2 1National Taiwan University, TW; 2Arizona State University, US Abstract Different types of lane-changing assistance systems are usually developed separately by different automotive makers or suppliers. A lane-changing model can meet its own requirements, but it may be incompatible with another lane-changing model. In this paper, we verify if two lane-changing models are compatible so that the two corresponding vehicles on different lanes can exchange their lanes successfully. We propose a methodology and an algorithm to perform the verification on the combinations of four lane-changing models. Experimental results demonstrate the compatibility (or incompatibility) between the models. The verification results can be utilized during runtime to prevent incompatible vehicles from entering a lane-changing road segment. To the best of our knowledge, this is the first work considering the compatibility issue for lane-changing models. |
IP.3_6.2 | PAXC: A PROBABILISTIC-ORIENTED APPROXIMATE COMPUTING METHODOLOGY FOR ANN LEARNING Speaker: Pengfei Huang, Nanjing University of Aeroanutics and Astronautics, CN Authors: Pengfei Huang, Chenghua Wang, Ke Chen and Weiqiang Liu, Nanjing University of Aeronautics and Astronautics, CN Abstract In spite of the rapidly increasing number of approximate designs in circuit logic stack for Artificial Neural Networks (ANNs) learning. A principled and systematic approximate hardware incorporating domain knowledge is still lacking. As the layer of ANN becomes deeper, the errors introduced by approximate hardware will be accumulated quickly, which can result in unexpected results. In this paper, we propose a probabilistic oriented approximate computing (PAxC) methodology based on the notion of approximate probability to overcome the conceptual and computational difficulties inherent to probabilistic ANN learning. The PAxC makes use of minimum likelihood error in both circuit and application level to maintain the aggressive approximate datapaths to boost the benefits from the trade-off between accuracy and energy. Compared with a baseline design, the proposed method significantly reduces the power-delay product (PDP) with a negligible accuracy loss. Simulation and a case study of image processing validate the effectiveness of the proposed methodology. |
IP.3_6.3 | LAC: LEARNED APPROXIMATE COMPUTING Speaker: Tianmu Li, University of California, Los Angeles, US Authors: Vaibhav Gupta1, Tianmu Li2 and Puneet Gupta1 1UCLA, US; 2University of California, Los Angeles, US Abstract Approximate hardware trades acceptable error for improved performance and previous literature focuses on optimizing this trade-off in the hardware. We show in this paper that the application (i.e., the software) can be optimized for better accuracy without losing any performance benefits of the approximate hardware. We propose LAC: learned approximate computing as a method of tuning the application parameters to compensate for hardware errors. Our approach showed improvements across a variety of standard signal/image processing applications delivering an average improvement of 5.82db in PSNR and 0.23 in SSIM of the outputs. This translates to up to 87% power reduction and 83% area reduction for similar application quality. LAC allows the same approximate hardware to be used for multiple applications. |
IP.3_7 Interactive presentations
Add this session to my calendar
Date: Wednesday, 23 March 2022
Time: 11:30 CET - 12:15 CET
Interactive Presentations (IPs) run simultaneously during a 45-minute slot. Authors of IPs are available to present their work and answer to questions throughout the session.
Label | Presentation Title Authors |
---|---|
IP.3_7.1 | EVA-CAM: A CIRCUIT/ARCHITECTURE-LEVEL EVALUATION TOOL FOR GENERAL CONTENT ADDRESSABLE MEMORIES Speaker: Liu Liu, University of Notre Dame, US Authors: Liu Liu1, Mohammad Mehdi Sharifi1, Ramin Rajaei2, Arman Kazemi1, Kai Ni3, Xunzhao Yin4, Michael Niemier1 and X. Sharon Hu1 1University of Notre Dame, US; 2Department of Computer Science and Engineering, University of Notre Dame, US; 3Rochester Institute of Technology, US; 4Zhejiang University, CN Abstract Content addressable memories (CAMs), a special-purpose in-memory computing (IMC) unit, support parallel searches directly in memory. There is growing interest in CAMs for data-intensive applications such as machine learning and bioinformatics. The design space for CAMs is rapidly expanding. In addition to traditional ternary CAMs (TCAMs), analog CAM (ACAM) and multi-bit CAM (MCAM) designs based on various non-volatile memory (NVM) devices have been recently introduced and may offer higher density, better energy efficiency, and non-volatility. Furthermore, aside from the widely-used exact match based search, CAM-based approximate matches have been proposed to further extend the utility of CAMs to new application spaces. For this memory architecture, evaluating different CAM design options for a given application is becoming more challenging. This paper presents Eva-CAM, a circuit/architecture-level modeling and evaluation tool for CAMs. Eva-CAM supports TCAM, ACAM, and MCAM designs implemented in non-volatile memories, for both exact and approximate match types. It also allows for the exploration of CAM array structures and sensing circuits. Eva-CAM has been validated with HSPICE simulation results and chip measurements. A comprehensive case study is described for FeFET CAM design space exploration. |
IP.3_7.2 | HYBRID DIGITAL-DIGITAL IN-MEMORY COMPUTING Speaker: Muhammad Rashedul Haq Rashed, University of Central Florida, US Authors: Muhammad Rashedul Haq Rashed1, Sumit Kumar Jha2, Fan Yao1 and Rickard Ewetz1 1University of Central Florida, US; 2University of Texas at San Antonio, US Abstract In-memory computing (IMC) using emerging non-volatile memory promises exascale computing capabilities for a number of data-intensive workloads. The state-of-the-art solution to accelerating high assurance applications is based on digital in-memory computing. Digital in-memory computing can be WRITE-based or READ-based, i.e., logic is evaluated while switching or without switching the state of the non-volatile resistive devices. All prominent studies for accelerating matrix-vector multiplication (MVM) based applications utilize a single digital logic style. However, we observe that WRITE-based and READ-based digital in-memory computing are advantageous for dense and sparse matrices, respectively. In this paper, we propose a new computing paradigm called hybrid digital-digital in-memory computing paradigm. The paper also introduces automated synthesis tool for mapping computation to a hybrid architecture. The key idea is to first decompose the matrix into dense and sparse blocks. Next, bit-slicing is used to further decompose the dense blocks into sparse and dense parts. The dense (sparse) blocks are mapped to WRITE-based (READ-based) digital in-memory accelerators. The proposed paradigm is evaluated using 12 applications from various domains. Compared with WRITE-based IMC, the hybrid digital-digital paradigm improves energy and speed with 13X and 20X at the expense of increasing the area with 151X. Compared with READ-based IMC, the hybrid paradigms improves energy, speed, and area with 264X, 198X, and 2996X, respectively. |
IP.3_7.3 | NEUROHAMMER: INDUCING BIT-FLIPS IN MEMRISTIVE CROSSBAR MEMORIES Speaker: Felix Staudigl, Institute for Communication Technologies and Embedded Systems, RWTH Aachen University, DE Authors: Felix Staudigl1, Hazem al Indari1, Daniel Schön2, Dominik Sisejkovic1, Farhad Merchant1, Jan Moritz Joseph1, Vikas Rana3, Stephan Menzel2 and Rainer Leupers1 1Institute for Communication Technologies and Embedded System, RWTH Aachen University, DE; 2Peter-Günberg-Institut (PGI-7), Forschungszentrum Jülich GmbH & JARA-FIT, DE; 3Peter-Günberg-Institut (PGI-10), Forschungszentrum Jülich GmbH, DE Abstract Emerging non-volatile memory (NVM) technologies offer unique advantages in energy efficiency, latency, and features such as computing-in-memory. Consequently, emerging NVM technologies are considered an ideal substrate for computation and storage in future-generation neuromorphic platforms. These technologies need to be evaluated for fundamental reliability and security issues. In this paper, we present NeuroHammer, a security threat in ReRAM crossbars caused by thermal crosstalk between memory cells. We demonstrate that bit-flips can be deliberately induced in ReRAM devices in a crossbar by systematically writing adjacent memory cells. A simulation flow is developed to evaluate NeuroHammer and the impact of physical parameters on the effectiveness of the attack. Finally, we discuss the security implications in the context of possible attack scenarios. |
IP.3_8 Interactive presentations
Add this session to my calendar
Date: Wednesday, 23 March 2022
Time: 11:30 CET - 12:15 CET
Interactive Presentations (IPs) run simultaneously during a 45-minute slot. Authors of IPs are available to present their work and answer to questions throughout the session.
Label | Presentation Title Authors |
---|---|
IP.3_8.1 | A LOW-COST METHODOLOGY FOR EM FAULT EMULATION ON FPGA Speaker: Paolo MAISTRI, TIMA Laboratory, FR Authors: Paolo Maistri and Jiayun Po, TIMA Laboratory, FR Abstract In embedded systems, the presence of a security layer is now a well-established requirement. In order to guarantee the suitable level of performance and resistance against attacks, dedicated hardware implementations are often proposed to accelerate cryptographic computations in a controllable environment. On the other hand, these same implementations may be vulnerable to physical attacks, such as side channel analysis or fault injections. In this scenario, the designer must hence be able to assess the robustness of the implementation (and of the adopted countermeasures) as soon as possible in the design flow against several different threats. In this paper, we propose a methodology to characterize the robustness of a generic hardware design described at RTL against EM fault injections. Thanks to our framework, we are able to emulate the EM faults on FPGA platforms, without the need of expensive equipment or lengthy experimental campaigns. We present a tool supporting our methodology and the first validations tests done on several AES designs confirming the feasibility of the proposed approach. |
IP.3_8.2 | RELIABILITY ANALYSIS OF FINFET-BASED SRAM PUFS FOR 16NM, 14NM, AND 7NM TECHNOLOGY NODES Speaker: Shayesteh Masoumian, Intrinsic ID, NL Authors: Shayesteh Masoumian1, Georgios Selimis1, Rui Wang1, Geert-Jan Schrijen1, Said Hamdioui2 and Mottaqiallah Taouil2 1Intrinsic ID, NL; 2Delft University of Technology, NL Abstract SRAM Physical Unclonable Functions (PUFs) are among other things today commercially used for secure primitives such as key generation and authentication. The quality of the PUFs and hence the security primitives, depends on intrinsic variations which are technology dependent. Therefore, to sustain the commercial usage of PUFs for cutting-edge technologies, it is important to properly model and evaluate their reliability. In this work, we evaluate the SRAM PUF reliability using within class Hamming distance (WCHD) for 16nm, 14nm, and 7nm using simulations and silicon validation for both low-power and high-performance designs. The results show that our simulation models and expectations match with the silicon measurements. From the experiments, we conclude the following: (1) SRAM PUF is reliable in advanced FinFET technology nodes, i.e., the noise is low in 16nm, 14nm, and 7nm, (2) temperature variations have a marginal impact on the reliability, and (3) both low-power and high-performance SRAMs can be used as a PUF without excessive need of error correcting codes (ECCs). |
IP.3_8.3 | BOILS: BAYESIAN OPTIMISATION FOR LOGIC SYNTHESIS Speaker: Antoine Grosnit, Huawei Noah's ark Lab, FR Authors: Antoine Grosnit1, Cedric Malherbe2, Xingchen Wan1, Rasul Tutunov1, Jun Wang3 and Haitham Bou Ammar1 1Huawei R&D London, GB; 2Huawei R&D Paris, FR; 3University of College London, GB Abstract Optimising the quality-of-results (QoR) of circuits during logic synthesis is a formidable challenge necessitating the exploration of exponentially sized search spaces. While expert-designed operations aid in uncovering effective sequences, the increase in complexity of logic circuits favours automated procedures. To enable efficient and scalable solvers, we propose BOiLS, the first algorithm adapting Bayesian optimisation to navigate the space of synthesis operations. BOiLS requires no human intervention and trades-off exploration versus exploitation through novel Gaussian process kernels and trust-region constrained acquisitions. In a set of experiments on EPFL benchmarks, we demonstrate BOiLS's superior performance compared to state-of-the-art in terms of both sample efficiency and QoR values. |
L.2 Panel: The future of conferences - what will DATE and the others be like?
Add this session to my calendar
Date: Wednesday, 23 March 2022
Time: 13:00 CET - 14:00 CET
Session chair:
Ian O'Connor, Lyon Institute of Nanotechnology, FR
Panellists:
David Atienza, École Polytechnique Fédérale de Lausanne (EPFL), CH
Enrico Macii, Politecnico di Torino, IT
Yiran Chen, Duke University, US
Tulika Mitra, National University of Singapore, SG
The panel aims at exploring how conferences will be organized and attended after the Covid-19 pandemic to meet time and costs sustainability, attendees interests and needs as well as the opportunities offered by technology.
21.1 Self-adaptive and Dynamic Resource Management, Learning at the Edge and Applications
Add this session to my calendar
Date: Wednesday, 23 March 2022
Time: 14:30 CET - 15:30 CET
Session chair:
Heba Khdr, Karlsruhe Institute of Technology, DE
Session co-chair:
Federico Corradi, iMEC, NL
Self-adaptive and runtime decision making is increasingly important for optimizing the extra-functional behaviour of modern systems. The first part of this session covers various techniques for optimizing specific objectives, such as performance, energy consumption, and accuracy, and applies these techniques to different parts of the systems. The second part of this session includes papers that advance the state of the art of machine learning and its applications at the edge. The fourth paper improves localization through an encoder-based framework, the fifth one proposes novel and efficient accelerator architectures for probabilistic reasoning models, and the sixth one presents a high-efficiency and low-cost framework for federated learning.
Time | Label | Presentation Title Authors |
---|---|---|
14:30 CET | 21.1.1 | (Best Paper Award Candidate) ACCURATE PROBABILISTIC MISS RATIO CURVE APPROXIMATION FOR ADAPTIVE CACHE ALLOCATION IN BLOCK STORAGE SYSTEMS Speaker: Yingtian Tang, University of Pennsylvania, US Authors: Rongshang Li1, Yingtian Tang2, QIQUAN SHI3, Hui Mao4, Lei Chen4, Jikun Jin5, Peng Lu5 and Zhuo Cheng6 1University of Sydney, AU; 2University of Pennsylvania, US; 3Huawei Noah's Ark Lab, HK; 4Huawei Noah's Ark Lab, CN; 5Huawei Storage Product Line, CN; 6Tsinghua University and Huawei Storage Product Line, CN Abstract Cache plays an important role in storage systems. With better allocation of cache space to each storage device, total I/O latency can be reduced remarkably. To achieve this goal, we propose an Accurate Probabilistic miss ratio curve approximation for Adaptive Cache allocation (APAC) system. APAC can obtain near-optimal performance for allocating cache space with low overhead. Specifically, with a linear-time probabilistic approximation of reuse distance of all blocks inside each device, APAC can accurately estimate the miss ratio curve (MRC). Furthermore, APAC utilizes the MRCs to obtain the near-optimal configuration of cache allocation by dynamic programming. Experimental results show that APAC achieves higher accuracy in MRC approximation compared to the state-of-the-art methods, leading to higher hit ratio and lower latency of the block storage systems. |
14:34 CET | 21.1.2 | SGRM: STACKELBERG GAME-BASED RESOURCE MANAGEMENT FOR EDGE COMPUTING SYSTEMS Speaker: Manolis Katsaragakis, National TU Athens and KU Leuven, GR Authors: Antonios Karteris1, Manolis Katsaragakis2, Dimosthenis Masouros1 and Dimitrios Soudris1 1National TU Athens, GR; 2National TU Athens and KU Leuven, GR Abstract The incessant technological advancements of recent Internet of Things (IoT) networks have led to a rapidly increasing number of connected devices and workloads. Resource management is a key technique for such systems to operate efficiently. In this paper, we present SGRM, a game theory-based framework for dynamic resource management of IoT networks under CPU, memory, bandwidth and latency constraints. SGRM combines a novel execution time prediction mechanism along with Stackelberg games and Vickrey auctions in order to tackle the multi-objective problem of task offloading in a competitive Edge Computing system. We design, implement and evaluate our novel game theory-based framework over a real IoT system for a diverse set of interference scenarios and varying devices, showing that i) the proposed prediction mechanism can provide accurate predictions, achieving 2.3% absolute percentage error on average, ii) SGRM achieves near-optimal results and outperforms alternative solutions by up to 66.6% and iii) SGRM provides scalable, real-time and lightweight performance characteristics. |
14:38 CET | 21.1.3 | RUNTIME ENERGY MINIMIZATION OF DISTRIBUTED MANY-CORE SYSTEMS USING TRANSFER LEARNING Speaker: Dainius Jenkus, Newcastle University, GB Authors: Dainius Jenkus, Fei Xia, Rishad Shafik and Alex Yakovlev, Newcastle University, GB Abstract The heterogeneity of computing resources continues to permeate into many-core systems making energy-efficiency a challenging objective. Existing rule-based and model-driven methods return sub-optimal energy-efficiency and limited scalability as system complexity increases to the domain of distributed systems. This is exacerbated further by dynamic variations of workloads and quality-of-service (QoS) demands. This work presents a QoS-aware runtime management method for energy minimization using a transfer learning (TL) driven exploration strategy. It enhances standard Q-learning to improve both learning speed and operational optimality (i.e., QoS and energy). The core to our approach is a multi-dimensional knowledge transfer across a task's state-action space. It accelerates the learning of dynamic voltage/frequency scaling (DVFS) control actions for tuning power/performance trade-offs. Firstly, the method identifies and transfers already learned policies between explored and behaviorally similar states referred to as Intra-Task Learning Transfer (ITLT). Secondly, if no similar “expert” states are available, it accelerates exploration at a local state's level through what’s known as Intra-State Learning Transfer (ISLT). A comparative evaluation of the approach indicates faster and more balanced exploration. This is shown through energy savings ranging from 7.30% to 18.06%, and improved QoS from 10.43% to 14.3%, when compared to existing exploration strategies. This method is demonstrated under WordPress and TensorFlow workloads on a server cluster. |
14:42 CET | 21.1.4 | SIAMESE NEURAL ENCODERS FOR LONG-TERM INDOOR LOCALIZATION WITH MOBILE DEVICES Speaker: Saideep Tiku, Colorado State University, US Authors: Saideep Tiku and Sudeep Pasricha, Colorado State University, US Abstract WiFi fingerprinting-based indoor localization on smartphones is an emerging application domain for enhanced positioning and tracking of people and assets within indoor locales. Unfortunately, the transmitted signal characteristics from independently maintained WiFi access points (APs) vary greatly over time. Moreover, some of the WiFi APs visible at the initial deployment phase may be replaced or removed over time. These factors are often ignored and cause gradual and cata-strophic degradation of indoor localization accuracy post-deployment, over weeks and months. We propose a Siamese neural encoder-based framework that offers up to 40% reduction in degradation of localization accuracy over time compared to the state-of-the-art in the area, without requiring any re-training. |
14:46 CET | 21.1.5 | DISCRETE SAMPLERS FOR APPROXIMATE INFERENCE IN PROBABILISTIC MACHINE LEARNING Speaker: Shirui Zhao, KU Leuven, BE Authors: Shirui Zhao1, Nimish Shah1, Wannes Meert2 and Marian Verhelst3 1Department of Electrical Engineering, ESAT-MICAS, KU Leuven, BE; 2Departement of Computer Science, KU Leuven, BE; 3KU Leuven, BE Abstract Probabilistic reasoning models (PMs) and probabilistic inference bring advantages when dealing with small datasets or uncertainty on the observed data, and allow to integrate expert knowledge and create interpretable models. The main challenge of using these PMs in practice is that their inference is very compute-intensive. Therefore, custom hardware architectures for the exact and approximate inference of PMs have been proposed in the SotA. The throughput, energy and area efficiency of approximate PM inference accelerators are strongly dominated by the sampler blocks required to sample arbitrary discrete distributions. This paper proposes and studies novel discrete sampler architectures towards efficient and flexible hardware implementations for PM accelerators. Both cumulative distribution table (CDT) and Knuth-Yao (KY) based sampling algorithms are assessed, based on which different sampler hardware architectures were implemented. Innovation is brought in terms of a reconfigurable CDT sampling architecture with a flexible range and a reconfigurable Knuth-Yao sampling architecture that supports both flexible range and dynamic precision. All architectures are benchmarked on real-world Bayesian Networks, demonstrating up to 13x energy efficiency benefits and 11x area efficiency improvement of the optimized reconfigurable Knuth-Yao sampler over the traditional linear CDT-based samplers used in the PM SotA. |
14:50 CET | 21.1.6 | HELCFL: HIGH-EFFICIENCY AND LOW-COST FEDERATED LEARNING IN HETEROGENEOUS MOBILE-EDGE COMPUTING Speaker: Yangguang Cui, East China Normal University, CN Authors: Yangguang Cui1, Kun Cao2, Junlong Zhou3 and Tongquan Wei1 1East China Normal University, CN; 2Jinan University, CN; 3Nanjing University of Science and Technology, CN Abstract Federated Learning (FL), an emerging distributed machine learning (ML), empowers a large number of embedded devices (e.g., phones and cameras) and a server to jointly train a global ML model without centralizing user private data on a server. However, when deploying FL in a mobile-edge computing (MEC) system, restricted communication resources of the MEC system, heterogeneity and constrained energy of user devices have a severe impact on FL training efficiency. To address these issues, in this article, we design a distinctive FL framework, called HELCFL, to achieve high-efficiency and low-cost FL training. Specifically, by analyzing the theoretical foundation of FL, our HELCFL first develops a utility-driven and greedy-decay user selection strategy to enhance FL performance and reduce training delay. Subsequently, by analyzing and utilizing the slack time in FL training, our HELCFL introduces a device operating frequency determination approach to reduce training energy costs. Experiments verify that our HELCFL can enhance the highest accuracy by up to 43.45%, realize the training speedup of up to 275.03%, and save up to 58.25% training energy costs compared to state-of-the-art baselines. |
14:54 CET | 21.1.7 | Q&A SESSION Authors: Heba Khdr1 and Federico Corradi2 1Karlsruhe Institute of Technology, DE; 2IMEC, NL Abstract Questions and answers with the authors |
21.2 Advances in defect detection and dependability
Add this session to my calendar
Date: Wednesday, 23 March 2022
Time: 14:30 CET - 15:30 CET
Session chair:
Leticia Maria Bolzani Poehls, RWTH Aachen University, DE
Session co-chair:
Ernesto Sanchez, Politecnico di Torino, IT
This session addresses advances in defect detection and dependability improvement. We cover a wide range of aspects: from hotspot detection and variability reduction to minimize the influence of processing. Up to design aspects to make latches resilient to radiation upsets of up to three nodes and improving security by masking the power signature.
Time | Label | Presentation Title Authors |
---|---|---|
14:30 CET | 21.2.1 | HOTSPOT DETECTION VIA GRAPH NEURAL NETWORK Speaker: Shuyuan Sun, Fudan University, CN Authors: Shuyuan Sun1, Yiyang Jiang1, Fan Yang1, Bei Yu2 and Xuan Zeng1 1Fudan University, CN; 2The Chinese University of Hong Kong, HK Abstract Lithography hotspot detection is of great importance in chip manufacturing. It aims to find patterns that may incur defects in the early design stage. Inspired by the success of deep learning in computer vision, many works convert layouts into images, turn the hotspot detection problem into an image classification task. Traditional graph-based methods consume fewer computer resources and less detection time compared to image-based methods, but they have too many false alarms. In this paper, a hotspot detection approach via the graph neural network (GNN) is proposed. We also propose a novel representation model to map a layout to one graph, in which we introduce multi-dimensional features to encode components of the layout. Then we use a modified GNN to further process the extracted layout features and get an embedding of the local geometric relationship.Experimental results on the ICCAD2012 Contest benchmarks show our proposed approach can achieve over 10$ imes$ speedup and fewer false alarms without loss of accuracy. On the ICCAD2020 benchmark, our model can achieve 2.10\% higher accuracy compared with the previous approach. |
14:34 CET | 21.2.2 | FITACT: ERROR RESILIENT DEEP NEURAL NETWORKS VIA FINE-GRAINED POST-TRAINABLE ACTIVATION FUNCTIONS Speaker: Behnam Ghavami, Simon Fraser University, CA Authors: Behnam Ghavami1, Mani Sadati2, Zhenman Fang1 and Lesley Shannon1 1Simon Fraser University, CA; 2Independent Researcher, IR Abstract Deep neural networks (DNNs) are increasingly being deployed in safety-critical systems such as personal healthcare devices and self-driving cars. In such DNN-based systems, error resilience is a top priority since faults in DNN inference could lead to mispredictions and safety hazards. For latency-critical DNN inference on resource-constrained edge devices, it is nontrivial to apply conventional redundancy-based fault tolerance techniques. In this paper, we propose FitAct, a low-cost approach to enhance the error resilience of DNNs by deploying fine-grained post-trainable activation functions. The main idea is to precisely bound the activation value of each individual neuron via neuron-wise bounded activation functions, so that it could prevent the fault propagation in the network. To avoid complex DNN model re-training, we propose to decouple the accuracy training and resilience training, and develop a lightweight post-training phase to learn these activation functions with precise bound values. Experimental results on widely used DNN models such as AlexNet, VGG16, and ResNet50 demonstrate that FitAct outperform state-of-the-art studies such as Clip-Act and Ranger in enhancing the DNN error resilience for a wide range of fault rates, while adding manageable runtime and memory space overheads. |
14:38 CET | 21.2.3 | WRAP: WEIGHT REMAPPING AND PROCESSING IN RRAM-BASED NEURAL NETWORK ACCELERATORS CONSIDERING THERMAL EFFECT Speaker: Ing-Chao Lin, National Cheng Kung University, TW Authors: Po-Yuan Chen, Fang-Yi Gu, Yu-Hong Huang and Ing-Chao Lin, National Cheng Kung University, TW Abstract Abstract— Resistive random-access memory (RRAM) has shown great potential for computing in memory (CIM) to support the requirements of high memory bandwidth and low power in neuromorphic computing systems. However, the accuracy of RRAM-based neural network (NN) accelerators can degrade significantly due to the intrinsic statistical variations of the resistance of RRAM cells, as well as the negative effects of high temperatures. In this paper, we propose a subarray-based thermal-aware weight remapping and processing framework (WRAP) to map the weights of a neural network model into RRAM subarrays. Instead of dealing with each weight individually, this framework maps weights into subarrays and performs subarray-based algorithms to reduce computational complexity while maintaining accuracy under thermal impact. Experimental results demonstrate that using our framework, inference accuracy losses of four DNN models are less than 2% compared to the ideal results and 1% with compensation applied even when the surrounding temperature is around 360K. |
14:42 CET | 21.2.4 | (Best Paper Award Candidate) SELF-TERMINATED WRITE OF MULTI-LEVEL CELL RERAM FOR EFFICIENT NEUROMORPHIC COMPUTING Speaker: Zongwu Wang, Shanghai Jiao Tong University, CN Authors: Zongwu Wang1, Zhezhi He1, Rui Yang1, Shiquan Fan2, Jie Lin3, Fangxin Liu1, Yueyang Jia1, Chenxi Yuan2, Qidong Tang1 and Li Jiang1 1Shanghai Jiao Tong University, CN; 2Xi’an Jiaotong University, CN; 3University of Central Florida, US Abstract The Resistive Random-Access-Memory (ReRAM) in crossbar structure has shown great potential in accelerating the vector-matrix multiplication, owing to the fascinating computing complexity reduction (from O(n^2) to O(1)). Nevertheless, the ReRAM cells still encounter device programming variation and resistance drifting during computation (known as read disturbance), which significantly hamper its analog computing precision. Inspired by prior precise memory programming works, we propose a Self-Terminating Write (STW) circuit for Multi-Level Cell (MLC) ReRAM. In order to minimize the area overhead, the design heavily reuses inherent computing peripherals (e.g., Analog-to-Digital Converter and Trans-Impedance Amplifier) in conventional dot-product engine. Thanks to the fast and precise programming capability of our design, the ReRAM cell can possess 4 linear distributed conductance levels, with minimum latency used for intermediate resistance refreshing. Our comprehensive cross-layer (device/circuit/architecture) simulation indicates that the proposed MLC STW scheme can effectively obtain 2-bit precision via a single programming pulse. Besides, our design outperforms the prior write & verify scheme by 4.7x and 2x in programming latency and energy, respectively. |
14:46 CET | 21.2.5 | SCLCRL: SHUTTLING C-ELEMENTS BASED LOW-COST AND ROBUST LATCH DESIGN PROTECTED AGAINST TRIPLE NODE UPSETS IN HARSH RADIATION ENVIRONMENTS Speaker: Aibin Yan, Anhui University, CN Authors: Aibin Yan1, Zhixing Li1, Shiwei Huang1, Zijie Zhai1, Xiangyu Cheng1, Jie Cui1, Tianming Ni2, Xiaoqing Wen3 and Patrick Girard4 1Anhui University, CN; 2Anhui Polytechnic University, CN; 3Kyushu Institute of Technology, JP; 4LIRMM / CNRS, FR Abstract As the CMOS technology is continuously scaling down, nano-scale integrated circuits are becoming susceptible to harsh-radiation induced soft errors, such as double-node upsets (DNUs) and triple-node upsets (TNUs). This paper presents a shuttle C-elements based low-cost and robust latch (namely SCLCRL) that can recover from any TNU in harsh radiation environments. The latch comprises seven primary storage nodes and seven secondary storage nodes. Each pair of primary nodes feeds a secondary node through one C-element (CE) and each pair of secondary nodes feeds a primary node through another CE, forming redundant feedback loops to robustly retain values. Simulation results validate all key TNUs’ recoverability features of the proposed latch. Simulation results also demonstrate that the proposed SCLCRL latch can approximately save 29% silicon area and 47% D-Q delay on average at the cost of moderate power, compared with the state-of-the-art TNU-recoverable reference latches of the same-type. |
14:50 CET | 21.2.6 | LEAKAGE POWER ANALYSIS IN DIFFERENT S-BOX MASKING PROTECTION SCHEMES Speaker: Javad Bahrami, University of Maryland Baltimore County, US Authors: Javad Bahrami1, Mohammad Ebrahimabadi1, Jean Luc Danger2, Sylvain Guilley3 and Naghmeh Karimi1 1University of Maryland Baltimore County, US; 2Télécom ParisTech, FR; 3Secure-IC, FR Abstract Internet-of-Things (IoT) devices are natural targets for side-channel attacks. Still, side-channel leakage can be com- plex: its modeling can be assisted by statistical tools. Projection of the leakage into an orthonormal basis allows to understand its structure, typically linear (1st-order leakage) or non-linear (sometimes referred to as glitches). In order to ensure cryptosys- tems protection, several masking methods have been published. Unfortunately, they follow different strategies; thus it is hard to compare them. Namely, ISW is constructive, GLUT is systematic, RSM is a low-entropy version of GLUT, RSM-ROM is a further optimization aiming at balancing the leakage further, and TI aims at avoiding, by design, the leakage arising from the glitches. In practice, no study has compared these styles on an equal basis. Accordingly, in this paper, we present a consistent methodology relying on a Walsh-Hadamard transform in this respect. We consider different masked implementations of substitution boxes of PRESENT algorithm, as this function is the most leaking in symmetric cryptography. We show that ISW is the most secure among the considered masking implementations. For sure, it takes strong advantage of the knowledge of the PRESENT substitution box equation. Tabulated masking schemes appear as providing a lesser amount of security compared to unprotected counterparts. The leakage is assessed over time, i.e., considering device aging which contributes to mitigate the leakage differently according to the masking style. |
14:54 CET | 21.2.7 | Q&A SESSION Authors: Leticia Maria Bolzani Poehls1 and Ernesto Sanchez2 1RWTH Aachen University, DE; 2Politecnico di Torino, IT Abstract Questions and answers with the authors |
21.3 Real-Time Systems and Technology
Add this session to my calendar
Date: Wednesday, 23 March 2022
Time: 14:30 CET - 15:30 CET
Session chair:
Renato Mancuso, Boston University, US
Session co-chair:
Yasmina Abdeddaim, UGE, FR
Modern embedded real-time systems are facing multiple challenges related to predictably scheduling concurrent and parallel task systems upon multi-core and heterogeneous platforms. In this session, we present a number of exciting papers addressing timing-predictability challenges related to memory-aware scheduling, parallel task systems, partitioning hypervisors, controller-area network (CAN) and energy optimization for embedded systems.
Time | Label | Presentation Title Authors |
---|---|---|
14:30 CET | 21.3.1 | (Best Paper Award Candidate) CACHE-AWARE SCHEDULABILITY ANALYSIS OF PREM COMPLIANT TASKS Speaker: Syed Aftab Rashid, CISTER, ISEP Polytechnic Institute of Porto, PT Authors: Syed Aftab Rashid1, Muhammad Ali Awan1, Pedro Souto2, Konstantinos Bletsas1 and Eduardo Tovar1 1CISTER, ISEP Polytechnic Institute of Porto, PT; 2University of Porto, PT Abstract The Predictable Execution Model (PREM) is useful for mitigating inter-core interference due to shared resources such as the main memory. However, it is cache-agnostic, which makes schedulabulity analysis pessimistic, via overestimation of prefetches and write-backs. In response, we present cache-aware schedulability analysis for PREM tasks on fixed-task-priority partitioned multicores, that bounds the number of cache prefetches and write-backs. Our approach identifies memory blocks loaded in the execution of a previous scheduling interval of each task, that remain in the cache until its next scheduling interval. Doing so, greatly reduces the estimated prefetches and write backs. In experimental evaluations, our analysis improves the schedulability of PREM tasks by up to 55 percentage points. |
14:34 CET | 21.3.2 | RECONCILING QOS AND CONCURRENCY IN NVIDIA GPUS VIA WARP-LEVEL SCHEDULING Speaker: Jayati Singh, University of Illinois Urbana-Champaign, US Authors: Jayati Singh1, Ignacio Sañudo Olmedo2, Nicola Capodieci2, Andrea Marongiu2 and Marco Caccamo3 1University of Illinois Urbana-Champaign, US; 2University of Modena and Reggio Emilia, IT; 3TU Munich, DE Abstract The widespread deployment of NVIDIA GPUs in latency-sensitive systems today requires predictable GPU multi-tasking, which cannot be trivially achieved. The NVIDIA CUDA API allows programmers to easily exploit the processing power provided by these massively parallel accelerators and is one of the major reasons behind their ubiquity. However, NVIDIA GPUs and the CUDA programming model favor throughput instead of latency and timing predictability. Hence, providing real-time and quality-of-service (QoS) properties to GPU applications presents an interesting research challenge. Such a challenge is paramount when considering simultaneous multikernel (SMK) scenarios, wherein kernels are executed concurrently within each streaming multiprocessor (SM). In this work, we explore QoS-based fine-grained multitasking in SMK via job arbitration at the lowest level of the GPU scheduling hierarchy, i.e., between warps. We present QoS-aware warp scheduling (QAWS) and evaluate it against state-of-the-art, kernel-agnostic policies seen in NVIDIA hardware today. Since the NVIDIA ecosystem lacks a mechanism to specify and enforce kernel priority at the warp granularity, we implement and evaluate our proposed warp scheduling policy on GPGPU-Sim. QAWS not only improves the response time of the higher priority tasks but also has comparable or better throughput than the state-of-the-art policies. |
14:38 CET | 21.3.3 | COUNTING PRIORITY INVERSIONS: COMPUTING MAXIMUM ADDITIONAL CORE REQUESTS OF DAG TASKS Speaker: Morteza Mohaqeqi, Uppsala University, SE Authors: Morteza Mohaqeqi, Gaoyang Dai and Wang Yi, Uppsala University, SE Abstract Many parallel real-time applications can be modeled as DAG tasks. Guaranteeing timing constraints of such applications executed on multicore systems is challenging, especially for the applications with non-preemptive execution blocks. The existing approach for timing analysis of such tasks with sporadic release relies on computing a bound on the interfering workload on a task, which depends on the number of priority inversions the task may experience. The number of priority inversions, in turn, is a function of the total number of additional cores a task instance may request after each node spawning. In this paper, we show that the previously proposed polynomial-time algorithm to compute the maximum number of additional core requests of a DAG is not correct, providing a counter example. We show that the problem is in fact NP-hard. We then present an ILP formulation as an exact solution to the problem. Our evaluations show that the problem can be solved in a few minutes even for DAGs with hundreds of nodes. |
14:42 CET | 21.3.4 | SHYPER: AN EMBEDDED HYPERVISOR APPLYING HIERARCHICAL RESOURCE ISOLATION STRATEGIES FOR MIXED-CRITICALITY SYSTEMS Speaker: Siran Li, Beihang University, CN Authors: YiCong Shen, Lei Wang, YuanZhi Liang, SiRan Li and Bo Jiang, School of Computer Science and Engineering, Beihang University, CN Abstract With the development of the IoT, modern embedded systems are evolving to general-purpose and mixed-criticality systems, where virtualization has become the key to guarantee the isolation between tasks with different criticality. Traditional server-based hypervisors (KVM and Xen) are difficult to use in embedded scenarios due to performance and security reasons. As a result, several new hypervisors (Jailhouse and Bao) have been proposed in resent years, which effectively solve the problems above through static partitioning. However, this inflexible resource isolation strategy assumes no resource sharing across guests, which greatly reduces the resource utilization and VM scalability. This prevents themselves from simultaneously fulfilling the differentiated demands from VMs conducting different tasks. This paper proposes an efficient and real-time embedded hypervisor "Shyper", aiming at providing differentiated services for VMs with different criticality. To achieve that, Shyper supports fine-grained hierarchical resource isolation strategies and introduces several novel "VM-Exit-less" real-time virtualization techniques, which grants users the flexibility to strike a trade-off between VM's resource utilization and real-time performance. In this paper, we also compare Shyper with other mainstream hypervisors (KVM, Jailhouse, etc.) to evaluate its feasibility and effectiveness. |
14:46 CET | 21.3.5 | RESPONSE TIME ANALYSIS FOR ENERGY-HARVESTING MIXED-CRITICALITY SYSTEMS Speaker: Kankan Wang, Northeastern University, CN Authors: Kankan Wang, Yuhan Lin and Qingxu Deng, Northeastern University, CN Abstract With the increasing demand for real-time computing applications on energy-harvesting embedded devices which are deployed wherever it is not possible or practical to recharge, the worst-case performance analysis becomes crucial. However, it is difficult to bound the worst-case response time of tasks under both timing and energy constraints due to the uncertainty of harvested energy. Based on this motivation, this paper studies response time analysis for Energy-Harvesting Mixed-Criticality (EHMC) systems. We present schedulability analysis algorithm to extend the Adaptive Mixed Criticality (AMC) approach to EHMC systems. Furthermore, we develop two response time bounds for it. To our best knowledge, this is the first work of response time analysis for EHMC systems. Finally, we examine both the effectiveness and the tightness of the bounds by experiments. |
14:50 CET | 21.3.6 | LATENCY ANALYSIS OF SELF-SUSPENDING TASK CHAINS Speaker: Tomasz Kloda, TU Munich, DE Authors: Tomasz Kloda1, Jiyang Chen2, Antoine Bertout3, Lui Sha2 and Marco Caccamo1 1TU Munich, DE; 2University of Illinois at Urbana-Champaign, US; 3LIAS, Université de Poitiers, ISAE-ENSMA, FR Abstract Many cyber-physical systems are offloading computation-heavy programs to hardware accelerators (e.g.,GPU and TPU) to reduce execution time. These applications will self-suspend between offloading data to the accelerators and obtaining the returned results. Previous efforts have shown that self-suspending tasks can cause scheduling anomalies, but none has examined inter-task communication. This paper aims to explore self-suspending tasks' data chain latency with periodic activation and asynchronous message passing. We first present the cause for suspension-induced delays and worst-case latency analysis. We then propose a rule for utilizing the hardware co-processors to reduce data chain latency and schedulability analysis. Simulation results show that the proposed strategy can improve overall latency while preserving~system~schedulability. |
14:54 CET | 21.3.7 | Q&A SESSION Authors: Renato Mancuso1 and Yasmina ABDEDDAÏM2 1Boston University, US; 2LIGM, Univ Gustave Eiffel, CNRS, FR Abstract Questions and answers with the authors |
21.4 Defense Techniques for Secure and Trustworthy Systems
Add this session to my calendar
Date: Wednesday, 23 March 2022
Time: 14:30 CET - 15:30 CET
Session chair:
Sophie Dupuis, LIRMM, University of Montpellier, FR
Session co-chair:
Elif Bilge Kavun, Univ Passau, DE
Building novel defense mechanisms to thwart attacks towards real-world systems is vital due to the valuable assets in such systems that need to be protected. This session focuses on defense techniques providing countermeasures against side-channel attacks and hardware Trojans. The contributions in this session include conventional as well as novel machine learning methods for the detection of hardware Trojans, protection mechanisms against side-channel analysis of chips and neural networks, and exploitation of sequentiality and synthesis flexibility in logic obfuscation for thwarting SAT Attacks.
Time | Label | Presentation Title Authors |
---|---|---|
14:30 CET | 21.4.1 | COUNTERACT SIDE-CHANNEL ANALYSIS OF NEURAL NETWORKS BY SHUFFLING Speaker: Manuel Brosch, TU Munich, DE Authors: Manuel Brosch1, Matthias Probst1 and Georg Sigl2 1TU Munich, DE; 2TU Munich / Fraunhofer Institute for Applied and Integrated Security (AISEC), DE Abstract Machine learning is becoming an essential part in almost every electronic device. Implementations of neural networks are mostly targeted towards computational performance or memory footprint. Nevertheless, security is also an important part in order to keep the network secret and protect the intellectual property associated to the network. Especially, since neural network implementations are demonstrated to be vulnerable to side-channel analysis, powerful and computational cheap countermeasures are in demand. In this work, we apply a shuffling countermeasure to a microcontroller implementation of a neural network to prevent side-channel analysis. The countermeasure is effective while the computational overhead is low. We investigate the extensions necessary for our countermeasure, and how shuffling increases the effort for an attack in theory. In addition, we demonstrate the increase in effort for an attacker through experiments on real side-channel measurements. Based on the mechanism of shuffling and our experimental results, we conclude that an attack on a commonly used neural network with shuffling is no longer feasible in a reasonable amount of time. |
14:34 CET | 21.4.2 | GNN4GATE: A BI-DIRECTIONAL GRAPH NEURAL NETWORK FOR GATE-LEVEL HARDWARE TROJAN DETECTION Speaker: Dong Cheng, College of Computer and Data Science, Fuzhou University, Fuzhou, China, CN Authors: Dong Cheng1, Chen Dong1, Wenwu He2, Zhenyi Chen3 and Yi Xu1 1Fuzhou University, CN; 2Fujian University of Technology, CN; 3University of South Floride, US Abstract Hardware is the physical foundation of cyberspace, and chips are the core components. The security risk of the chip will bring disaster to the entire world. Hardware Trojans (HTs) are malicious circuits, which are the primary security issue of chip. Recently, a series of machine learning-based HT detection methods were proposed. However, some shortcomings still deserve further consideration, such as relying too much on manual feature extraction, losing some signal propagation structure information, being hard to track the HTs' location and adapt them to various types of HTs. To address the above challenges, this paper proposes a gate-level HT detection method based on Graph Neural Network (GNN), named GNN4Gate, which is a golden-free Trojan-gate identification technology. Specifically, a special coding method combining logic gate type and port connection information is developed for circuit graph modeling. Based on this, taking logic gates as the classification object, an automatic GNN detection architecture based on Bi-directional Graph Convolutional Network (Bi-GCN) is developed to aggregate both the circuit signal propagation (forward) and dispersion (backward) structure features from the circuit graph. The proposed method is evaluated by Trusthub benchmarks with different functional HTs, the average True Positive Rate (Recall) is 87.14%, and the average True Negative Rate is 99.73%. The experimental results demonstrate that GNN4Gate is sufficiently accurate compared to the state-of-the-art detection works at gate-level. |
14:38 CET | 21.4.3 | GOLDEN MODEL-FREE HARDWARE TROJAN DETECTION BY CLASSIFICATION OF NETLIST MODULE GRAPHS Speaker: Alexander Hepp, TU Munich, DE Authors: Alexander Hepp1, Johanna Baehr1 and Georg Sigl2 1TU Munich, DE; 2TU Munich/Fraunhofer AISEC, DE Abstract In a world where increasingly complex integrated circuits are manufactured in supply chains across the globe, hardware Trojans are an omnipresent threat. State-of-the-art methods for Trojan detection often require a golden model of the device under test. Other methods that operate on the netlist without a golden model can not handle complex designs and operate on Trojan-specific sets of netlist graph features. In this work, we propose a novel machine-learning-based method for hardware Trojan detection. Our method first uses a library of known malicious and benign modules in hierarchical designs to train an eXtreme Gradient Boosted Tree Classifier (XGBClassifier). For training, we generate netlist graphs of each hierarchical module and calculate feature vectors comprising structural characteristics of these graphs. After the training phase, we can analyze the synthesized hierarchical modules of an unknown design under test. The method calculates a feature vector for each module. With this feature vector, each module can be classified into either benign or malicious by the previously trained XGBClassifier. After classifying all modules, we derive a classification for all standard cells in the design under test. This technique allows the identification of hardware Trojan cells in a design and highlights regions of interest to direct further reverse engineering efforts. Experiments show that this approach performs with >97% Sensitivity and Specificity across available and generated hardware Trojan benchmarks and can be applied to more complex designs than previous netlist-based methods while maintaining similar computational complexity. |
14:42 CET | 21.4.4 | JANUS-HD: EXPLOITING FSM SEQUENTIALITY AND SYNTHESIS FLEXIBILITY IN LOGIC OBFUSCATION TO THWART SAT ATTACK WHILE OFFERING STRONG CORRUPTION Speaker: Leon Li, University of California, San Diego, US Authors: Leon Li1 and Alex Orailoglu2 1University of California, San Diego, US; 2UC San Diego, US Abstract Logic obfuscation has been proposed as a countermeasure towards chip counterfeiting and IP piracy by obfuscating circuit designs with a key-controlled locking mechanism. However, the extensive output corruption of early key gate based logic obfuscation techniques has exposed them to effective SAT attacks. While current SAT resilient logic obfuscation techniques succeed in undermining the attack by offering near-trivial output corruption, they do so at the expense of a drastic reduction in functional and structural protection scope. In this work, we present JANUS-HD based on novel insights that succeed to deliver the heretofore elusive goal of simultaneously boosting corruptibility and foiling SAT attacks. JANUS-HD obfuscates an FSM through diverse FF configurations for different transitions with the overall configuration setting as the obfuscation secret. A key-controlled Hamming distance comparator controls the obfuscation status at the minimized number of entrance states identified through a custom graph partitioning algorithm. Reliance on the inherent state transition patterns extends the obfuscation benefits to non-entrance states without exposing any additional key space pruning trace. We leverage the flexibility of state encoding and equivalence-based FSM transformations to generate an obfuscated netlist at low overhead using standard synthesis tools. Finally, we present a scan chain crippling mechanism that delivers unfettered scan chain access while eradicating any key trace leakage in the scan mode, thus thwarting chosen-input attacks aimed at the Hamming distance comparator. We illustrate through experiments that JANUS-HD delivers obfuscation scope improvements of up to 45.5x over the state-of-the-art, establishing the first cost-effective solution to offer a broad yet attack-resilient obfuscation scope against supply chain threats. |
14:46 CET | 21.4.5 | TRILOCK: IC PROTECTION WITH TUNABLE CORRUPTIBILITY AND RESILIENCE TO SAT AND REMOVAL ATTACKS Speaker: Yuke Zhang, University of Southern California, US Authors: Yuke Zhang, Yinghua Hu, Pierluigi Nuzzo and Peter Beerel, University of Southern California, US Abstract Sequential logic locking has been studied over the last decade as a method to protect sequential circuits from reverse engineering. However, most of the existing sequential logic locking techniques are threatened by increasingly more sophisticated SAT-based attacks, efficiently using input queries to a SAT solver to rule out incorrect keys, as well as removal attacks based on structural analysis. In this paper, we propose TriLock, a sequential logic locking method that simultaneously addresses these vulnerabilities. TriLock can achieve high, tunable functional corruptibility while still guaranteeing exponential queries to the SAT solver in a SAT-based attack. Further, it adopts a state re-encoding method to obscure the boundary between the original state registers and those inserted by the locking method, thus making it more difficult to detect and remove the locking-related components. |
14:50 CET | 21.4.6 | Q&A SESSION Authors: Sophie Dupuis1 and Elif Bilge Kavun2 1LIRMM, FR; 2University of Passau, DE Abstract Questions and answers with the authors |
22.1 Heterogeneous system-on-chip design methods
Add this session to my calendar
Date: Wednesday, 23 March 2022
Time: 15:40 CET - 16:30 CET
Session chair:
Lana Josipovic, ETH Zurich, CH
Session co-chair:
John Wickerson, Imperial College, GB
The session presents various design methods addressing an array of important challenges in heterogeneous system-on-chip design. The methods cover not only system-level methods for FPGA and NoC architectures, but also high-level synthesis solutions for performance improvements, power estimation, energy efficiency, and data/IP protection. We complete the session with two interactive presentations about coarse-grained reconfigurable architectures and cloud systems.
Time | Label | Presentation Title Authors |
---|---|---|
15:40 CET | 22.1.1 | UNDERSTANDING AND MITIGATING MEMORY INTERFERENCE IN FPGA-BASED HESOCS Speaker: Gianluca Brilli, University of Modena and Reggio Emilia, IT Authors: Gianluca Brilli, Alessandro Capotondi, Paolo Burgio and Andrea Marongiu, Unimore, IT Abstract Like most high-end embedded systems, FPGA-based systems-on-chip (SoC) are increasingly adopting heterogeneous designs, where CPU cores, the configurable logic and other ICs all share interconnect and main memory (DRAM) controller. This paradigm is scalable and reduces production costs and time-to-market, but creates resource contention issues, which ultimately affects the programs' timing. This problem has been widely studied on CPU- and GPU-based systems, along with strategies to mitigate such effects, but little has been done so far to systematically study the problem on FPGA-based SoCs. This work provides an in-depth analysis of memory interference on such systems, targeting two state-of-the-art commercial FPGA SoCs. We also discuss architectural support for Controlled Memory Request Injection (CMRI), a technique that has proven effective at reducing the bandwidth under-utilization implied by naive schemes that solve the interference problem by only allowing mutually exclusive access to the shared resources. Our experimental results show that: i) memory interference can slow down CPU tasks by up to 16x in the tested FPGA-based SoCs; ii) CMRI allows to exploit more than 40% of the memory bandwidth available to FPGA accelerators (normally completely unused in PREM-like schemes), keeping the slowdown due to interference below 10%. |
15:44 CET | 22.1.2 | (Best Paper Award Candidate) POWERGEAR: EARLY-STAGE POWER ESTIMATION IN FPGA HLS VIA HETEROGENEOUS EDGE-CENTRIC GNNS Speaker: Zhe Lin, Peng Cheng Laboratory, CN Authors: Zhe Lin1, Zike Yuan2, Jieru Zhao3, Wei Zhang4, Hui Wang1 and Yonghong Tian5 1Peng Cheng Laboratory, CN; 2University of Auckland, NZ; 3Shanghai Jiao Tong University, CN; 4Hong Kong University of Science and Technology, HK; 5Peking University & Peng Cheng Laboratory, CN Abstract Power estimation is the basis of many hardware optimization strategies. However, it is still challenging to offer accurate power estimation at an early stage such as high-level synthesis (HLS). In this paper, we propose PowerGear, a graph-learning-assisted power estimation approach for FPGA HLS, which features high accuracy, efficiency and transferability. PowerGear comprises two main components: a graph construction flow and a customized graph neural network (GNN) model. Specifically, in the graph construction flow, we introduce buffer insertion, datapath merging, graph trimming and feature annotation techniques to transform HLS designs into graph-structured data, which encode both intra-operation micro-architectures and inter-operation interconnects annotated with switching activities. Furthermore, we propose a novel power-aware heterogeneous edge-centric GNN model which effectively learns heterogeneous edge semantics and structural properties of the constructed graphs via edge-centric neighborhood aggregation, and fits the formulation of dynamic power. Compared with on-board measurement, PowerGear estimates total and dynamic power for new HLS designs with errors of 3.60% and 8.81%, respectively, which outperforms the prior arts in research and the commercial product Vivado. In addition, PowerGear demonstrates a speedup of 4x over Vivado power estimator. Finally, we present a case study in which PowerGear is exploited to facilitate design space exploration for FPGA HLS, leading to a performance gain of up to 11.2%, compared with methods using state-of-the-art predictive models. |
15:48 CET | 22.1.3 | ENERGY EFFICIENT, REAL-TIME AND RELIABLE TASK DEPLOYMENT ON NOC-BASED MULTICORES WITH DVFS Speaker: Lei Mo, Southeast University, CN Authors: Lei Mo1, Qi Zhou1, Angeliki Kritikakou2 and Ji Liu3 1Southeast University, CN; 2Univ Rennes, Inria, CNRS, IRISA, FR; 3Baidu Research, CN Abstract Task deployment plays an important role in the overall system performance, especially for complex architectures, including several cores with Dynamic Voltage and Frequency Scaling (DVFS) and Network-on-Chips (NoC). Task deployment affects not only the energy consumption but also the real-time response and reliability of the system. In this work, a task deployment approach is proposed to optimize the overall system energy consumption, including computation of the cores and communication of the NoC, under task reliability and real-time constraints. More precisely, the task deployment approach combines task allocation and scheduling, frequency assignment, task duplication, and multi-path data routing. The task deployment problem is formulated using mixed-integer non-linear programming. To find the optimal solution, the original problem is equivalently transformed to mixed-integer linear programming, and solved by state-of-the-art solvers. Furthermore, a decomposition-based heuristic, with low computational complexity, is proposed to deal with scalability. Finally, extended simulations evaluate the proposed methods. |
15:52 CET | 22.1.4 | COXHE: A SOFTWARE-HARDWARE CO-DESIGN FRAMEWORK FOR FPGA ACCELERATION OF HOMOMORPHIC COMPUTATION Speaker: Mingqin Han, Shandong University, CN Authors: Mingqin Han1, Yilan Zhu1, Qian Lou2, Zimeng Zhou1, Shanqing Guo1 and Lei Ju1 1Shandong University, CN; 2Indiana University, US Abstract Data privacy becomes a crucial concern in the AI and big data era. Fully homomorphic encryption (FHE) is a promising data privacy protection technique where the entire computation is performed on encrypted data. However, the dramatic increase of the computation workload restrains the usage of FHE for the real-world applications. In this paper, we propose an FPFA accelerator design framework for CKKS-based HE. While the key-switch operations are the primary performance bottleneck of FHE computation, we propose a low latency design of key-switch module with reduced intra-operation data dependency. Compared with the state-of-the-art FPGA based key-switch implementation that is based on Verilog, the proposed high-level synthesis (HLS) based design reduces the operation latency by 40%. Furthermore, we propose an automated design space exploration framework which generates optimal encryption parameters and accelerators for a given application kernel and the target FPGA device. Experimental results for a set of real HE application kernels on different FPGA devices show that our HLS-based flexible design framework produces substantially better accelerator design compared with a fixed-parameter HE accelerator in terms of security, approximation error, and overall performance. |
15:56 CET | 22.1.5 | A COMPOSABLE DESIGN SPACE EXPLORATION FRAMEWORK TO OPTIMIZE BEHAVIORAL LOCKING Speaker: Christian Pilato, Politecnico di Milano, IT Authors: Luca Collini1, Ramesh Karri2 and Christian Pilato1 1Politecnico di Milano, IT; 2NYU, US Abstract Globalization of the integrated circuit (IC) supply chain exposes designs to security threats such as reverse engineering and intellectual property (IP) theft. Designers may want to protect specific high-level synthesis (HLS) optimizations or micro-architectural solutions of their designs. Hence, protecting the IP of ICs is essential. Behavioral locking is an approach to thwart these threats by operating at high levels of abstraction instead of reasoning on the circuit structure. Like any security protection, behavioral locking requires additional area. Existing locking techniques have a different impact on security and overhead, but they do not explore the effects of alternatives when making locking decisions. We develop a design-space exploration (DSE) framework to optimize behavioral locking for a given security metric. For instance, we optimize differential entropy under area or key-bit constraints. We define a set of heuristics to score each locking point by analyzing the system dependence graph of the design. The solution yields better results for 92% of the cases when compared to baseline, state-of-the-art (SOTA) techniques. The approach has results comparable to evolutionary DSE while requiring 100x to 400x less computational time. |
16:00 CET | 22.1.6 | Q&A SESSION Authors: Lana Josipovic1 and John Wickerson2 1ETH Zurich, CH; 2Imperial College London, GB Abstract Questions and answers with the authors |
22.2 Power, Thermal and Performance Management for Advanced Computing Systems
Add this session to my calendar
Date: Wednesday, 23 March 2022
Time: 15:40 CET - 16:30 CET
Session chair:
Pascal Vivet, CEA-LIST, FR
Session co-chair:
Andrea Bartolini, Bologna University, IT
This session discusses power and temperature management and performance gain for computing systems. The first two papers aim to advance energy management for energy-harvesting wearable devices and multi-core systems using federated reinforcement learning. The following two papers present thermal management methods for processor systems, focusing on 3D integration and cache contention modeling, respectively. The last paper boosts performance with smart cache prefetching.
Time | Label | Presentation Title Authors |
---|---|---|
15:40 CET | 22.2.1 | DIET: A DYNAMIC ENERGY MANAGEMENT APPROACH FOR WEARABLE HEALTH MONITORING DEVICES Speaker: Nuzhat Yamin, Washington State University, US Authors: Nuzhat Yamin, Ganapati Bhat and Jana Doppa, Washington State University, US Abstract Wearable devices are becoming increasingly popular for health and activity monitoring applications. These devices typically include small rechargeable batteries to improve user comfort. However, the small battery capacity leads to limited operating life, requiring frequent recharging. Recent research has proposed energy harvesting using light and user motion to improve the lifetime of wearable devices. Most energy harvesting approaches assume that the placement of the energy harvesting device and sensors required for health monitoring are the same. However, this assumption does not hold for several real-world applications. For example, motion energy harvesting using piezoelectric sensors is limited to the knees and elbows, while a sensor for heart rate monitoring must be placed on the chest for optimal performance. To address this challenge, we propose a novel dynamic energy management approach referred to as DIET for wearable health applications enabled by multiple sensors and energy harvesting devices. The key idea behind DIET is to harvest energy from multiple sources and optimally allocate it to each sensor using a lightweight optimization algorithm such that the overall utility for applications is maximized. Experiments on real-world data from four users over 30 days show that the DIET approach achieves utility within 10% of an offline Oracle. |
15:44 CET | 22.2.2 | IMPROVE THE STABILITY AND ROBUSTNESS OF POWER MANAGEMENT THROUGH MODEL-FREE DEEP REINFORCEMENT LEARNING Speaker: Lin Chen, Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, HK Authors: Lin Chen1, Xiao Li2 and Jiang Xu3 1Electronic and Computer Engineering Department, AI Chip Center for Emerging Smart Systems, The Hong Kong University of Science and Technology, HK; 2Electronic and Computer Engineering Department, The Hong Kong University of Science and Technology, HK; 3Microelectronics Thrust, Electronic and Computer Engineering Department, AI Chip Center for Emerging Smart Systems, The Hong Kong University of Science and Technology, HK Abstract Achieving high performance with low energy consumption has become a primary design objective in multi-core systems. Recently, power management based on reinforcement learning has shown great potential in adapting to dynamic environments without much prior knowledge. However, conventional Q-learning (QL) algorithms adopted in most existing works encounter serious problems about scalability, instability, and overestimation. In this paper, we present a deep reinforcement learning-based approach to improve the stability and robustness of power management while reducing the energy-delay product (EDP) under user-specified performance requirements. The comprehensive status of the system is monitored periodically, making our controller sensitive to environmental change. To further improve the learning effectiveness, knowledge sharing among multiple devices is implemented in our approach. Experimental results on multiple realistic applications show that the proposed method can reduce the instability up to 68% compared with QL. Through knowledge sharing among multiple devices, our federated approach achieves around 4.8% EDP improvement over QL on average. |
15:48 CET | 22.2.3 | (Best Paper Award Candidate) COREMEMDTM: INTEGRATED PROCESSOR CORE AND 3D MEMORY DYNAMIC THERMAL MANAGEMENT FOR IMPROVED PERFORMANCE Speaker: Lokesh Siddhu, Indian Institute of Technology, Delhi, IN Authors: Lokesh Siddhu1, Rajesh Kedia1 and Preeti Ranjan Panda2 1Indian Institute of Technology Delhi, IN; 2Indian Institute of Technology, Delhi, IN Abstract The growing performance of processors and 3D memories has resulted in higher power densities and temperatures. Dynamic thermal management (DTM) policies for processor cores and memory have received significant research attention, but existing solutions address processors and 3D memories independently, which causes overcompensation, and there is a need to coordinate the DTM of the two subsystems. Further, existing CPU DTM policies slow down heated cores significantly, increasing the overall execution time and performance overheads. We propose CoreMemDTM, a technique for integrating processor core and 3D memory DTM policies that attempts to minimize performance overheads. We suggest employing DTM depending on the thermal margin since safe temperature thresholds might differ for the two subsystems. We propose a stall-balanced core DVFS policy for core thermal management that enables distributed cooling, decreasing overheads. We evaluate CoreMemDTM using ten different SPEC CPU2017 workloads across various safe temperature thresholds and observe average execution time and energy improvements of 14% and 36% compared to state-of-the-art thermal management policies. |
15:52 CET | 22.2.4 | THERMAL- AND CACHE-AWARE RESOURCE MANAGEMENT BASED ON ML-DRIVEN CACHE CONTENTION PREDICTION Speaker: Mohammed Bakr Sikal, Karlsruhe Institute of Technology, DE Authors: Mohammed Bakr Sikal1, Heba Khdr1, Martin Rapp1 and Joerg Henkel2 1Karlsruhe Institute of Technology, DE; 2Karlsruhe institute of technology, DE Abstract While on-chip many-core systems enable a large number of applications to run in parallel, the increased overall performance may come at the cost of complicating the performance constraints of individual applications due to contention on shared resources. For instance, the competition for last-level cache by concurrently-running applications may lead to slowing down the execution and to potentially violating individual performance constraints. Clustered many-cores reduce cache contention at chip level by sharing caches only at cluster level. To reduce cache contention within a cluster, state-of-the art techniques aim to co-map a memory-intensive application with a compute-intensive application onto one cluster. However, compute-intensive applications typically consume high power, and therefore, executing another application in their nearby cores may lead to high temperatures. Hence, there is a trade-off between cache contention and temperature. This paper is the first to consider this trade-off through a novel thermal- and cache-aware resource management technique. We build a neural network (NN)-based model to predict the slowdown of the application execution induced by cache contention feeding our resource management technique that then optimizes the application mapping and selects the voltage/frequency levels of the clusters to compensate for the potential contention-induced slowdown. Thereby, it meets the performance constraints, while minimizing temperature. Compared to the state of the art, our technique significantly reduces the temperature by 30% on average, while satisfying performance constraints of all individual applications. |
15:56 CET | 22.2.5 | T-SKID: PREDICTING WHEN TO PREFETCH SEPARATELY FROM ADDRESS PREDICTION Speaker: Toru Koizumi, University of Tokyo, JP Authors: Toru Koizumi, Tomoki Nakamura, Yuya Degawa, Hidetsugu Irie, Shuichi Sakai and Ryota Shioya, University of Tokyo, JP Abstract Prefetching is an important technique for reducing the number of cache misses and improving processor performance, and thus various prefetchers have been proposed. Many prefetchers are focused on issuing prefetches sufficiently earlier than demand accesses to hide miss latency. In contrast, we propose a T-SKID prefetcher, which focuses on delaying prefetching. If a prefetcher issues prefetches for demand accesses too early, the prefetched line will be evicted before it is referenced. We found that existing prefetchers often issue such too-early prefetches, and this observation offers new opportunities to improve performance. To tackle this issue, T-SKID performs timing prediction independently of address prediction. In addition to issuing prefetches sufficiently early as existing prefetchers do, T-SKID can delay the issue of prefetches until an appropriate time if necessary. We evaluated T-SKID by simulations using SPEC CPU 2017. The result shows that T-SKID achieves a 5.6% performance improvement for multi-core environment, compared to Instruction Pointer Classifier based Prefetching, which is a state-of-the-art prefetcher. |
16:00 CET | 22.2.6 | Q&A SESSION Authors: Pascal Vivet1 and Andrea Bartolini2 1CEA-Leti, FR; 2University of Bologna, IT Abstract Questions and answers with the authors |
22.3 Compute in- and near- memory
Add this session to my calendar
Date: Wednesday, 23 March 2022
Time: 15:40 CET - 16:30 CET
Session chair:
Jean-Phillipe Noel, CEA, FR
Session co-chair:
Pierre-Emmanuel Gaillardon, University of Utah, US
This session deals with design issues around the concepts of in- and near-memory computing. This ranges from optimizing digital synthesis for crossbar-based IMC through optimizing the analog design of both RRAM-based IMC and MRAM-based NMC circuits. In addition, the issues of non-volatility of data in cache memories will also be tackled with innovative solutions.
Time | Label | Presentation Title Authors |
---|---|---|
15:40 CET | 22.3.1 | LIM-HDL: HDL-BASED SYNTHESIS FOR IN-MEMORY COMPUTING Speaker: Saman Froehlich, University of Bremen, DE Authors: Saman Froehlich1 and Rolf Drechsler2 1Department of Mathematics and Computer Science, University of Bremen, DE; 2University of Bremen/DFKI, DE Abstract HDLs are widely used in EDA for abstract specification and synthesis of logic circuits, as well as validation by simulation or formal verification techniques. Despite the popularity and the many benefits of HDL-based synthesis, it has not yet been performed for in-memory computing. Hence, there is a need to design a particular HDL which supplies efficient and compatible descriptions. In this paper, we enable HDL-based synthesis for the Programmable Logic-in-Memory (PLiM) computer architecture. The starting point to allow HDL-based synthesis for the PLiM computer architecture is to provide abstract descriptions of the final program similarly to the conventional logic synthesis approaches using standard HDLs such as VHDL or Verilog. We present LiM-HDL - a Verilog-based HDL - which allows for the detailed description of programs for in-memory computation. Having the description given in LiM-HDL, we propose a synthesis scheme which translates the description into PLiM programs, i.e. a sequence of resistive majority operations. This includes lexical and syntax analysis as well as preprocessing, custom levelization and a compiler. In our experiments, we show the benefits of LiM-HDL compared to classical Verilog-based synthesis. We show in a case-study that LiM-HDL can be used to implement programs with respect to constraints of specific applications such as edge computing in IoT, for which the PLiM computer is of particular interest and where low area is a key requirement. In our case-study, we show that we can reduce the number of ReRAM devices needed for the computation of an encryption module by 69% |
15:44 CET | 22.3.2 | TRIPLE-SKIPPING NEAR-MRAM COMPUTING FRAMEWORK FOR AIOT ERA Speaker: Juntong Chen, Southeast University, CN Authors: Juntong Chen, Hao Cai, Bo Liu and Jun Yang, Southeast University, CN Abstract Near memory computing (NMC) paradigm shows great significance in non-von Neumann architecture to reduce data movement. The normally-off and instance-on characteristics of spin-transfer torque magnetic random access memory (STTMRAM) promise energy-efficient storage in the AIoT era. To avoid unnecessary memory-related processing, we propose a novel write-read-calculation triple-skipping (TS) NMC for multiply-accumulate (MAC) operation with minimally modified peripheral circuits. The proposed TS-NMC is evaluated with a custom micro control unit (MCU) in 28-nm high-K metal gate (HKMG) CMOS process and foundry announced universal two-transistor two-magnetic tunnel junction (2T-2MTJ) MRAM cell. The framework consists of a sparse flag which is defined in extra STT-MRAM columns with only 0.73% area overhead, and a calculation block for NMC logic with 9.9% overhead. The TS-NMC can successfully work at 0.6-V supply voltage under 20MHz. This Near-MRAM framework can offer up to ∼95.6% energy saving compared to commercial SRAM refer to ultra-low-power benchmark (ULPBenchmark). Classification task on MNIST takes 13nJ/pattern. The energy access of memory, calculation, and the total can be reduced by 52.49×, 2.7×, and 11.3× respectively from the TS scheme. |
15:48 CET | 22.3.3 | ACHIEVING CRASH CONSISTENCY BY EMPLOYING PERSISTENT L1 CACHE Speaker: Akshay Krishna Ramanathan, Pennsylvania State University, US Authors: Akshay Krishna Ramanathan1, Sara Mahdizadeh Shahri2, Yi Xiao1 and Vijaykrishnan Narayanan1 1Pennsylvania State University, US; 2University of Michigan, US Abstract Emerging non-volatile memory technologies promise the opportunity for maintaining persistent data in memory. However, providing crash-consistency in such systems can be costly as any update to the persistent data has to reach the persistent domain in a specific order, imposing high overhead. Prior works, proposed solutions both in software (SW) and hardware (HW) to address this problem but fall short to remove this overhead completely. In this work, we propose Non-Volatile Cache (NVC) architecture design that employs a hybrid volatile, non-volatile memory cell employing monolithic 3D and Ferroelectric technology in L1 data cache to guarantee crash consistency with almost no performance overhead. We show that NVC achieves up to 5.1x speedup over state-of-the-art (SOTA) SW undo logging and 11% improvement over SOTA HW solution without yielding the conventional architecture, while incurring 7% hardware overhead. |
15:52 CET | 22.3.4 | REFERENCING-IN-ARRAY SCHEME FOR RRAM-BASED CIM ARCHITECTURE Speaker: Abhairaj Singh, Delft University of Technology, NL Authors: Abhairaj Singh, Rajendra Bishnoi and Said Hamdioui, Delft University of Technology, NL Abstract Resistive random access memory (RRAM) based computation-in-memory (CIM) architectures are attracting a lot of attention due to their potential in performing fast and energy-efficient computing. However, the RRAM variability and non-idealities limit the computing accuracy of such architectures, especially for multi-operand logic operations. This paper proposes a voltage-based differential referencing-in-array scheme that enables accurate two and multi-operand logic operations for RRAM-based CIM architecture. The scheme makes use of a 2T2R cell configuration to create a complementary bitcell structure that inherently acts also as a reference during the operation execution; this results in a high sensing margin. Moreover, the variation-sensitive multi-operand (N)AND operation is implemented using complementary-input (N)OR operation to further improve its accuracy. Simulation results for a post-layout extracted 512x512 (256Kb) RRAM-based CIM array show that up to 56 operand (N)OR/(N)AND operation can be accurately and reliably performed as opposed to a maximum of 4 operands supported by state-of-the-art solutions while offering up to 11.4X better energy-efficiency. |
15:56 CET | 22.3.5 | Q&A SESSION Authors: Jean-Philippe Noel1 and Pierre-Emmanuel Gaillardon2 1CEA, FR; 2University of Utah, US Abstract Questions and answers with the authors |
22.4 Formal Methods in Design and Verification of Software and Hardware Systems
Add this session to my calendar
Date: Wednesday, 23 March 2022
Time: 15:40 CET - 16:30 CET
Session chair:
Stefano Quer, Politecnico di Torino, IT
Session co-chair:
Christoph Scholl, University Freiburg, DE
Ever-growing complexity of Software and Hardware systems requires an increasing level of automation and more scalable design and verification methods. We will learn how Bounded Model Checking can be combined with Coverage Guided Fuzzing into an efficient and effective tool for software verification. We will get introduced to an FPGA solution of swarm verification engine that acts as a model checker capable of proving liveness property. Multiplier verification will be pushed forward through the clever use of dual variables and tail substitution in their algebraic encoding. Finally, as a comeback or resistance to leaving, BDDs remain on the scene, and, for some applications, it is shown how to construct provably optimal variable orders in polynomial time.
Time | Label | Presentation Title Authors |
---|---|---|
15:40 CET | 22.4.1 | (Best Paper Award Candidate) BMC+FUZZ : EFFICIENT AND EFFECTIVE TEST GENERATION Speaker: Ravindra Metta, TCS, IN Authors: Ravindra Metta1, Raveendra Medicherla1 and Samarjit Chakraborty2 1TCS, IN; 2UNC Chapel Hill, US Abstract Coverage Guided Fuzzing (CGF) is a greybox test generation technique. Bounded Model Checking (BMC) is a whitebox test generation technique. Both these have been highly successful at program coverage as well as error detection. It is well known that CGF fails to cover complex conditionals and deeply nested program points. BMC, on the other hand, fails to scale for programming features such as large loops and arrays. To alleviate the above problems, we propose (1) to combine BMC and CGF by using BMC for a short and potentially incomplete unwinding of a given program to generate effective initial test prefixes, which are then extended into complete test inputs for CGF to fuzz, and (2) in case BMC gets stuck even for the short unwinding, we automatically identify the reason, and rerun BMC with a corresponding remedial strategy. We call this approach as BMCFuzz and implemented it in the VeriFuzz framework. This implementation was experimentally evaluated by participating in Test-Comp 2021 and the results show that BMCFuzz is both effective and efficient at covering branches as well as exposing errors. In this paper, we present the details of BMCFuzz and our analysis of the experimental results. |
15:44 CET | 22.4.2 | DOLMEN: FPGA SWARM FOR SAFETY AND LIVENESS VERIFICATION Speaker: Emilien Fournier, ENSTA Bretagne, FR Authors: Emilien Fournier, Ciprian Teodorov and Loïc Lagadec, ENSTA Bretagne, FR Abstract To ensure correctness of critical systems, swarm verification produces proofs of failure on systems too large to be verified using model-checking. Recent research efforts exploit both intrinsic parallelism and low-latency on-chip memory offered by FPGAs to achieve 3 orders of magnitude speedups over software. However, these approaches are limited to safety verification that encodes only what the system should not do. Liveness properties express what the system should do, and are widely used in the verification of operating systems, distributed systems, and communication protocols. Both safety and liveness properties are of paramount importance to ensure systems correctness. This paper presents Dolmen, the first FPGA implementation of a swarm verification engine that supports both safety and liveness properties. Dolmen features a deeply pipelined verification core, along with a scalable architecture to allow high-frequency synthesis on large FPGAs. Our experimental results, on a Xilinx Virtex Ultrascale+ FPGA, show that the Dolmen architecture can achieve up to 4 orders of magnitude speedups compared to software model-checking. |
15:48 CET | 22.4.3 | ADDING DUAL VARIABLES TO ALGEBRAIC REASONING FOR GATE-LEVEL MULTIPLIER VERIFICATION Speaker: Daniela Kaufmann, Johannes Kepler University Linz, AT Authors: Daniela Kaufmann1, Paul Beame2, Armin Biere3 and Jakob Nordström4 1Johannes Kepler University Linz, AT; 2University of Washington, US; 3Albert-Ludwigs-University Freiburg, DE; 4Københavns Universitet (DIKU), DK Abstract Algebraic reasoning has proven to be one of the most effective approaches for verifying gate-level integer multipliers, but it struggles with certain components, necessitating the complementary use of SAT solvers. For this reason validation certificates require proofs in two different formats. Approaches to unify the certificates are not scalable, meaning that the validation results can only be trusted up to the correctness of compositional reasoning. We show in this paper that using dual variables in the algebraic encoding, together with a novel tail substitution and carry rewriting method, removes the need for SAT solvers in the verification flow and yields a single, uniform proof certificate. |
15:52 CET | 22.4.4 | ON THE OPTIMAL OBDD REPRESENTATION OF 2-XOR BOOLEAN AFFINE SPACES Speaker: Valentina Ciriani, Universita' degli Studi di Milano, IT Authors: Anna Bernasconi1, Valentina Ciriani2 and Marco Longhi2 1Universita' di Pisa, IT; 2Universita' degli Studi di Milano, IT Abstract A Reduced Ordered Binary Decision Diagram (ROBDD) is a data structure widely used in an increasing number of fields of Computer Science. In general, ROBDD representations of Boolean functions have a tractable size, polynomial in the number of input variable, for many practical applications. However, the size of a ROBDD, and consequently the complexity of its manipulation, strongly depends on the variable ordering: depending on the initial ordering of the input variables, the size of a ROBDD representation can grow from linear to exponential. In this paper, we study the ROBDD representation of Boolean functions that describe a special class of Boolean affine spaces, which play an important role in some logic synthesis applications. We first discuss how the ROBDD representations of these functions are very sensitive to variable ordering, and then we provide an efficient linear time algorithm for computing an optimal variable ordering that always guarantees a ROBDD of size linear in the number of input variables. |
15:56 CET | 22.4.5 | Q&A SESSION Authors: Stefano Quer1 and Christoph Scholl2 1Politecnico di Torino, IT; 2University Freiburg, DE Abstract Questions and answers with the authors |
23.1 Artificial Intelligence for embedded systems in healthcare
Add this session to my calendar
Date: Wednesday, 23 March 2022
Time: 16:40 CET - 17:20 CET
Session chair:
Marina Zapater, University of Applied Sciences Western Switzerland, CH
Session co-chair:
Daniele Pagliari, Politecnico di Torino, IT
Health-related application need more and more intelligence at the edge to process data efficiently. This session will explain how artificial intelligence can help.
Time | Label | Presentation Title Authors |
---|---|---|
16:40 CET | 23.1.1 | (Best Paper Award Candidate) BIOFORMERS: EMBEDDING TRANSFORMERS FOR ULTRA-LOW POWER SEMG-BASED GESTURE RECOGNITION Speaker: Alessio Burrello, University of Bologna, IT Authors: Alessio Burrello1, Francesco Bianco Morghet2, Moritz Scherer3, Simone Benatti4, Luca Benini5, Enrico Macii2, Massimo Poncino2 and Daniele Jahier Pagliari2 1Department of Electric and Eletronic Engineering, University of Bologna, IT; 2Politecnico di Torino, IT; 3ETH Zürich, CH; 4University of Bologna, IT; 5Università di Bologna and ETH Zürich, IT Abstract Human-machine interaction is gaining traction in rehabilitation tasks, such as controlling prosthetic hands or robotic arms. Gesture recognition exploiting surface electromyographic (sEMG) signals is one of the most promising approaches, given that sEMG signal acquisition is non-invasive and is directly related to muscle contraction. However, the analysis of these signals still presents many challenges since similar gestures result in similar muscle contractions. Thus the resulting signal shapes are almost identical, leading to low classification accuracy. To tackle this challenge, complex neural networks are employed, which require large memory footprints, consume relatively high energy and limit the maximum battery life of devices used for classification. This work addresses this problem with the introduction of the Bioformers. This new family of ultra-small attention-based architectures approaches state-of-the-art performance while reducing the number of parameters and operations of 4.9X. Additionally, by introducing a new inter-subjects pre-training, we improve the accuracy of our best Bioformer by 3.39%, matching state-of-the-art accuracy without any additional inference cost. Deploying our best performing Bioformer on a Parallel, Ultra-Low Power (PULP) microcontroller unit (MCU), the GreenWaves GAP8, we achieve an inference latency and energy of 2.72 ms and 0.14 mJ, respectively, 8.0X lower than the previous state-of-the-art neural network, while occupying just 94.2 kB of memory. |
16:44 CET | 23.1.2 | INCLASS: INCREMENTAL CLASSIFICATION STRATEGY FOR SELF-AWARE EPILEPTIC SEIZURE DETECTION Speaker: Lorenzo Ferretti, University of California Los Angeles (UCLA), IT Authors: Lorenzo Ferretti1, Giovanni Ansaloni2, Renaud Marquis3, Tomas Teijeiro4, Philippe Ryvlin3, David Atienza4 and Laura Pozzi5 1University of California Los Angeles, US; 2EPFL, CH; 3CHUV, CH; 4École Polytechnique Fédérale de Lausanne (EPFL), CH; 5USI Lugano, CH Abstract Wearable Health Companions allow the unobtrusive monitoring of patients affected by chronic conditions. In particular, by acquiring and interpreting bio-signals, they enable the detection of acute episodes in cardiac and neurological ailments. Nevertheless, the processing of bio-signals is computationally complex, especially when a large number of features are required to obtain reliable detection outcomes. Addressing this challenge, we present a novel methodology, named INCLASS, that iteratively extends employed feature sets at run-time, until a confidence condition is satisfied. INCLASS builds such sets at design time based on code analysis and profiling information. When applied to the challenging scenario of detecting epileptic seizures based on ECG and SpO2 acquisitions, INCLASS obtains savings of up to 54%, while incurring in a negligible loss of detection performance (1.1% degradation of specificity and sensitivity) with respect to always computing and evaluating all features. |
16:48 CET | 23.1.3 | AMSER: ADAPTIVE MULTI-MODAL SENSING FOR ENERGY EFFICIENT AND RESILIENT EHEALTH SYSTEMS Speaker: Emad Kasaeyan Naeini, University of California, Irvine, US Authors: Emad Kasaeyan Naeini1, Sina Shahhosseini1, Anil Kanduri2, Pasi Liljeberg2, Amir M. Rahmani1 and Nikil Dutt1 1University of California Irvine, US; 2University of Turku, FI Abstract eHealth systems deliver critical digital healthcare and wellness services for users by continuously monitoring physiological and contextual data. eHealth applications use multi-modal machine learning kernels to analyze data from different sensor modalities and automate decision-making. Noisy inputs and motion artifacts during sensory data acquisition affect the i) prediction accuracy and resilience of eHealth services and ii) energy efficiency in processing garbage data. Monitoring raw sensory inputs to identify and drop data and features from noisy modalities can improve prediction accuracy and energy efficiency. We propose a closed-loop monitoring and control framework for multi-modal eHealth applications, AMSER, that can mitigate garbage-in garbage-out by i) monitoring input modalities, ii) analyzing raw input to selectively drop noisy data and features, and iii) choosing appropriate machine learning models that fit the configured data and feature vector - to improve prediction accuracy and energy efficiency. We evaluate our AMSER approach using multi-modal eHealth applications of pain assessment and stress monitoring over different levels and types of noisy components incurred via different sensor modalities. Our approach achieves up to 22% improvement in prediction accuracy and 5.6x energy consumption reduction in the sensing phase against the state-of-the-art multi-modal monitoring application. |
16:52 CET | 23.1.4 | Q&A SESSION Authors: Marina Zapater1 and Daniele Jahier Pagliari2 1University of Applied Sciences Western Switzerland (HES-SO), CH; 2Politecnico di Torino, IT Abstract Questions and answers with the authors |
23.2 Performance Evaluation & Optimization using Modeling, Simulation & Benchmarking
Add this session to my calendar
Date: Wednesday, 23 March 2022
Time: 16:40 CET - 17:20 CET
Session chair:
Avi Ziv, IBM, IL
Session co-chair:
Daniel Grosse, Johannes Kepler University, AT
This session introduces solutions that increase the accuracy and/or the speed of assessing the performance of future designs. The solutions cover a simple and accurate modelling of the delay of multi-input gates, modeling at high level of non-idealities of the computation in memory components, a platform to assess PiMs, and a method to decide on the quantization of DNNs.
Time | Label | Presentation Title Authors |
---|---|---|
16:40 CET | 23.2.1 | A SIMPLE HYBRID MODEL FOR ACCURATE DELAY MODELING OF A MULTI-INPUT GATE Speaker: Arman Ferdowsi, TU Wien, AT Authors: Arman Ferdowsi, Juergen Maier, Daniel Oehlinger and Ulrich Schmid, TU WIEN, AT Abstract Faithfully representing small delay variations caused by transitions on different inputs in close temporal proximity is a challenging task for digital circuit delay models. In this paper, we show that a simple hybrid model, derived from considering transistors as ideal switches in a simple RC model, leads to a surprisingly accurate model. By analytically solving the resulting ODEs for a NOR gate, explicit expressions for the delay are derived. In addition, we experimentally compare our model's predictions to SPICE simulations and to existing delay models. |
16:44 CET | 23.2.2 | SYSCIM: SYSTEMC-AMS SIMULATION OF MEMRISTIVE COMPUTATION IN MEMORY Speaker: Ali BanaGozar, Eindhoven University of Technology, NL Authors: Seyed Hossein Hashemi Shadmehri1, Ali BanaGozar2, Mehdi Kamal1, Sander Stuijk2, Ali Afzali-Kusha1, Massoud Pedram3 and Henk Corporaal4 1University of Tehran, IR; 2Eindhoven University of Technology, NL; 3USC, US; 4TU/e (Eindhoven University of Technology), NL Abstract Computation-in-memory (CIM) is one of the most appealing computing paradigms, especially for implementing artificial neural networks. Non-volatile memories like ReRAMs, PCMs, etc., have proven to be promising candidates for the realization of CIM processors. However, these devices and their driving circuits are subject to non-idealities. This paper presents a comprehensive platform, named SysCIM, for simulating memristor-based CIM systems. SySCIM considers the impact of the non-idealities of the CIM components, including memristor device, memristor crossbar (interconnects), analog-to-digital converter, and transimpedance amplifier, on the vector-matrix multiplication performed by the CIM unit. The CIM modules are described in SystemC and SystemC-AMS to reach a higher simulation speed while maintaining high simulation accuracy. Experiments under different crossbar sizes show SySCIM performs simulations up to 117x faster than HSPICE with less than 4% accuracy loss. The modular design of SySCIM provides researchers with an easy design-space exploration tool to investigate the effects of various non-idealities. |
16:48 CET | 23.2.3 | PIMULATOR: A FAST AND FLEXIBLE PROCESSING-IN-MEMORY EMULATION PLATFORM Speaker: Sergiu Mosanu, University of Virginia, US Authors: Sergiu Mosanu, Mohammad Nazmus Sakib, Tommy Tracy II, Ersin Cukurtas, Alif Ahmed, Preslav Ivanov, Samira Khan, Kevin Skadron and Mircea Stan, University of Virginia, US Abstract Motivated by the memory wall problem, researchers propose many new Processing-in-Memory (PiM) architectures to bring computation closer to data. However, evaluating the performance of these emerging architectures involves using a myriad of tools, including circuit simulators, behavioral RTL or software simulation models, hardware approximations, etc. It is challenging to mimic both software and hardware aspects of a PiM architecture using the currently available tools with high performance and fidelity. Until and unless actual products that include PiM become available, the next best thing is to emulate various hardware PiM solutions on FPGA fabric and boards. This paper presents a modular, parameterizable, FPGA synthesizable soft PiM model suitable for prototyping and rapid evaluation of Processing-in-Memory architectures. The PiM model is implemented in System Verilog and allows users to generate any desired memory configuration on the FPGA fabric with complete control over the structure and distribution of the PiM logic units. Moreover, the model is compatible with the LiteX framework, which provides a high degree of usability and compatibility with the FPGA and RISC-V ecosystem. Thus, the framework enables architects to easily prototype, emulate and evaluate a wide range of emerging PiM architectures and designs. We demonstrate strategies to model several pioneering bitwise-PiM architectures and provide detailed benchmark performance results that demonstrate the platform's ability to facilitate design space exploration. We observe an emulation vs. simulation weighted-average speedup of 28x when running a memory benchmark workload. The model can utilize 100% BRAM and only 1% FF and LUT of an Alveo U280 FPGA board. The project is entirely open-source. |
16:52 CET | 23.2.4 | BENQ: BENCHMARKING AUTOMATED QUANTIZATION ON DEEP NEURAL NETWORK ACCELERATORS Speaker: Zheng Wei, Xi’an Jiaotong University, CN Authors: Zheng Wei1, Xingjun Zhang1, Jingbo Li2, Zeyu Ji1 and Jia Wei2 1Xi’an Jiaotong University, CN; 2Xi'an Jiaotong University, CN Abstract Hardware-aware automated quantization promises to unlock an entirely new algorithm-hardware co-design paradigm for efficiently accelerating deep neural network (DNN) inference by incorporating the hardware cost into the reinforcement learning (RL) -based quantization strategy search process. Existing works usually design an automated quantization algorithm targeting one hardware accelerator with a device-specific performance model or pre-collected data. However, determining the hardware cost is non-trivial for algorithm experts due to their lack of cross-disciplinary knowledge in computer architecture, compiler, and physical chip design. Such a barrier limits reproducibility and fair comparison. Moreover, it is notoriously challenging to interpret the results due to the lack of quantitative metrics. To this end, we first propose BenQ, which includes various RL-based automated quantization algorithms with aligned settings and encapsulates two off-the-shelf performance predictors with standard OpenAI Gym API. Then, we leverage cosine similarity and manhattan distance to interpret the similarity between the searched policies. The experiments show that different automated quantization algorithms can achieve near equivalent optimal trade-offs because of the high similarity between the searched policies, which provides insights for revisiting the innovations in automated quantization algorithms. |
16:56 CET | 23.2.5 | Q&A SESSION Authors: Avi Ziv1 and Daniel Grosse2 1IBM Research - Haifa, IL; 2Johannes Kepler University Linz, AT Abstract Questions and answers with the authors |
23.3 New Methods and Tools using Machine Learning
Add this session to my calendar
Date: Wednesday, 23 March 2022
Time: 16:40 CET - 17:20 CET
Session chair:
Shafique Muhammad, NYU Abu Dhabi, AE
Session co-chair:
Niar Smail, Université Polytechnique Hauts-de-France, FR
This session includes four regular and two IP papers that improve the state of the art of methods and tools for Machine Learning. Among the regular papers, the first one presents a new approach for graph classification with hyperdimensional computing, the second one takes inspiration from deep learning techniques in natural language processing to improve energy estimation, the third one presents an in situ training framework for memristive crossbar structures, and the fourth one offers new perspectives on compression algorithms for quantized neural networks. Among the IP papers, the first one proposes a neural approach to improving thermal management, while the second one presents a training method that uses bit gradients to obtain mixed-precision quantized models.
Time | Label | Presentation Title Authors |
---|---|---|
16:40 CET | 23.3.1 | (Best Paper Award Candidate) GRAPHHD: EFFICIENT GRAPH CLASSIFICATION USING HYPERDIMENSIONAL COMPUTING Speaker: Igor Nunes, University of California, Irvine, US Authors: Igor Nunes, Mike Heddes, Tony Givargis, Alex Nicolau and Alex Veidenbaum, University of California, Irvine, US Abstract Hyperdimensional Computing (HDC) developed by Kanerva is a computational model for machine learning inspired by neuroscience. HDC exploits characteristics of biological neural systems such as high-dimensionality, randomness and a holographic representation of information to achieve a good balance between accuracy, efficiency and robustness. HDC models have already been proven to be useful in different learning applications, especially in resource-limited settings such as the increasingly popular Internet of Things (IoT). One class of learning tasks that is missing from the current body of work on HDC is graph classification. Graphs are among the most important forms of information representation, yet, to this day, HDC algorithms have not been applied to the graph learning problem in a general sense. Moreover, graph learning in IoT and sensor networks, with limited compute capabilities, introduce challenges to the overall design methodology. In this paper, we present GraphHD — a baseline approach for graph classification with HDC. We evaluate GraphHD on real-world graph classification problems. Our results show that when compared to the state-of-the-art Graph Neural Networks (GNNs) the proposed model achieves comparable accuracy, while training and inference times are on average 14.6X and 2.0X faster, respectively. |
16:44 CET | 23.3.2 | DEEPPM: TRANSFORMER-BASED POWER AND PERFORMANCE PREDICTION FOR ENERGY-AWARE SOFTWARE Speaker: Jun S. SHIM, Seoul National University, KR Authors: Jun S. Shim1, Bogyeong Han1, Yeseong Kim2 and Jihong Kim1 1Seoul National University, KR; 2DGIST, KR Abstract Many system-level management and optimization techniques need accurate estimates of power consumption and performance. Earlier research has proposed many high-level/source-level estimation modeling works, particularly for basic blocks. However, most of them still need to execute the target software at least once on a fine-grained simulator or real hardware to extract required features. This paper proposes a performance/power prediction framework, called Deep Power Meter (DeepPM), which estimates them accurately only using the compiled binary. Inspired by the deep learning techniques in natural language processing, we convert the program instructions in the form of vectors and predict the average power and performance of basic blocks based on a transformer model. In addition, unlike existing works based on a Long Short-Term Memory (LSTM) model structure, which only works for basic blocks with a small number of instructions, DeepPM provides highly accurate results for long basic blocks, which takes the majority of the execution time for actual application runs. In our evaluation conducted with SPEC2006 benchmark suite, we show that DeepPM can provide accurate prediction for performance and power consumption with 10.2% and 12.3% error, respectively. DeepPM also outperforms the LSTM-based model by up to 67.2% and 34.9% error for performance and power, respectively. |
16:48 CET | 23.3.3 | QUANTIZATION-AWARE IN-SITU TRAINING FOR RELIABLE AND ACCURATE EDGE AI Speaker: João Paulo de Lima, Federal University of Rio Grande do Sul, BR Authors: Joao Paulo Lima1 and Luigi Carro2 1Federal University of Rio Grande do Sul, BR; 2UFRGS, BR Abstract In-memory analog computation based on memristor crossbars has become the most promising approach for DNN inference. Because compute and memory requirements are larger during training, memristive crossbars are also an alternative to train DNN models within a feasible energy budget for edge devices, especially in the light of trends towards security, privacy, latency, and energy reduction, by avoiding data transfer over the Internet. To enable online training and inference on the same device, however, there are still challenges related to different minimum bitwidth needed in each phase, and memristor non-idealities to be addressed. We provide an in-situ training framework that allows the network to adapt to hardware imperfections, while practically eliminating errors from weight quantization. We validate our methodology with image classifiers, namely MNIST and CIFAR10, by training NN models with 8-bit weights and quantizing to 2 bits. The training algorithm recovers up to 12% of the accuracy lost to quantization errors even under high variability, reduces training energy by up to 6x, and allows for energy-efficient inferences using a single cell per synapse, hence enhancing robustness and accuracy for a smooth training-to-inference transition. |
16:52 CET | 23.3.4 | ENCORE COMPRESSION: EXPLOITING NARROW-WIDTH VALUES FOR QUANTIZED DEEP NEURAL NETWORKS Speaker: Myeongjae Jang, KAIST, KR Authors: Myeongjae Jang, Jinkwon Kim, Jesung Kim and Soontae Kim, KAIST, KR Abstract Deep Neural Networks (DNNs) become a practical machine learning algorithm running on various Neural Processing Units (NPUs). For higher performance and lower hardware overheads, DNN datatype reduction through quantization is proposed. Moreover, to solve the memory bottleneck caused by large data size in DNNs, several zero value-aware compression algorithms are used. However, these compression algorithms do not compress modern quantized DNNs well because of decreased zero values. We find that the latest quantized DNNs have data redundancy due to frequent narrow-width values. Because low-precision quantization reduces DNN datatypes to a simple datatype with less bits, scattered DNN data are gathered to a small number of discrete values and incur a biased data distribution. Narrow-width values occupy a large proportion of the biased distribution. Moreover, an appropriate zero run-length bits can be dynamically changed according to DNN sparsity. Based on this observation, we propose a compression algorithm that exploits narrow-width values and variable zero run-length for quantized DNNs. In experiments with three quantized DNNs, our proposed scheme yields an average compression ratio of 2.99. |
16:56 CET | 23.3.5 | Q&A SESSION Authors: Muhammad Shafique1 and Smail Niar2 1New York University Abu Dhabi, AE; 2Université Polytechnique Hauts-de-France, FR Abstract Questions and answers with the authors |
23.4 Side-channel attacks and beyond
Add this session to my calendar
Date: Wednesday, 23 March 2022
Time: 16:40 CET - 17:20 CET
Session chair:
Begül Bilgin, RAMBUS Cryptography Research, NL
Session co-chair:
Maria Mushtaq, Telecom Paristech, FR
This session covers a variety of attacks and defense mechanisms: side-channel attack on DNN, thermal covert channel on Xeon processors, power-based side-channel on Homomorphic Encryption , a mitigation technique against Rowhammer attacks and a secure prefetcher against Cache Side Channel Attacks.
Time | Label | Presentation Title Authors |
---|---|---|
16:40 CET | 23.4.1 | (Best Paper Award Candidate) PREFENDER: A PREFETCHING DEFENDER AGAINST CACHE SIDE CHANNEL ATTACKS AS A PRETENDER Speaker: Lang Feng, Nanjing University, CN Authors: Luyi Li1, Jiayi Huang2, Lang Feng1 and Zhongfeng Wang1 1Nanjing University, CN; 2University of California, Santa Barbara, US Abstract Cache side channel attacks are increasingly alarming in modern processors due to the recent emergence of Spectre and Meltdown attacks. A typical attack performs intentional cache access and manipulates cache states to leak secrets by observing the victim's cache access patterns. Different countermeasures have been proposed to defend against both general and transient execution based attacks. Despite their effectiveness, they all trade some level of performance for security. In this paper, we seek an approach to enforcing security while maintaining performance. We leverage the insight that attackers need to access cache in order to manipulate and observe cache state changes for information leakage. Specifically, we propose Prefender, a secure prefetcher that learns and predicts attack-related accesses for prefetching the cachelines to simultaneously help security and performance. Our results show that Prefender is effective against several cache side channel attacks while maintaining or even improving performance for SPEC CPU2006 benchmarks. |
16:44 CET | 23.4.2 | STEALTHY INFERENCE ATTACK ON DNN VIA CACHE-BASED SIDE-CHANNEL ATTACKS Speaker: Han Wang, University of California Davis, US Authors: Han Wang, Syed Mahbub Hafiz, Kartik Patwari, Chen-Nee Chuah, Zubair Shafiq and Houman Homayoun, University of California Davis, US Abstract The advancement of deep neural networks (DNNs) motivates the deployment in various domains, including image classification, disease diagnoses, voice recognition, etc. Since some tasks that DNN undertakes are very sensitive, the label information is confidential and contains a commercial value or critical privacy. The leakage of label information can lead to further crimes, like intentionally causing a collision with DNN enabled autonomous systems, disrupting energy networks with DNN-based controlling systems, etc. This paper demonstrates that DNNs also bring a new security threat, leading to the leakage of label information of input instances for the DNN models. In particular, we leverage the cache-based side-channel attack (SCA), i.e., Flush+Reload on the DNN (victim) models, to observe the execution of computation graphs, and create a database of them for building a classifier that the attacker can use to decide the label information of (unknown) input instances for victim models. Then we deploy the cache-based SCA on the same host machine with victim models and deduce the labels with the attacker’s classification model to compromise the privacy and confidentiality of victim models. We explore different settings and classification techniques to achieve a high attack success rate of stealing label information from the victim models. Additionally, we consider two attacking scenarios: binary attacking identifies specific sensitive labels and others while multi-class attacking targets recognize all classes victim DNNs provide. Last, we implement the attack on both static DNN models with identical architectures for all inputs and dynamic DNN models with an adaptation of architectures for different inputs to demonstrate the vast existence of the proposed attack, including DenseNet 121, DenseNet 169, VGG 16, VGG 19, MobileNet v1, and MobileNet v2. Our experiment exhibits that MobileNet v1 is the most vulnerable one with 99% and 75.6% attacking success rates for binary and multi-class attacking scenarios, respectively. |
16:48 CET | 23.4.3 | KNOW YOUR NEIGHBOR: PHYSICALLY LOCATING XEON PROCESSOR CORES ON THE CORE TILE GRID Speaker and Author: Hyungmin Cho, Sungkyunkwan University, KR Abstract The physical locations of the processor cores in multi- or many-core CPUs are often hidden from the users. The current generation Intel Xeon CPUs accommodate many processor cores on a tile grid, but the exact locations of the individual cores are not plainly visible. We present a methodology for physically locating the cores in the Intel Xeon CPUs. Using the method, we collect core location samples of 300 CPU instances deployed in a commercial cloud platform, which reveal a wide variety of core map patterns. The locations of the individual processor cores are not contiguously mapped, and the mapping pattern can be different per each CPU instance. We also demonstrate that an attacker can exploit an inter-core thermal covert channel using the identified core locations. The attacker can increase the channel capacity by strategically placing multiple sender and receiver nodes. Our evaluation shows that up to 15 bps of data transfer is possible with less than 1% of bit error rate on a cloud environment, which is 3 times higher than the previously reported results. |
16:52 CET | 23.4.4 | REVEAL: SINGLE-TRACE SIDE-CHANNEL LEAKAGE OF THE SEAL HOMOMORPHIC ENCRYPTION LIBRARY Speaker: Furkan Aydin, North Carolina State University, US Authors: Furkan Aydin1, Emre Karabulut1, Seetal Potluri1, Erdem Alkim2 and Aydin Aysu1 1North Carolina State University, US; 2Department of Computer Science, Dokuz Eylul University, TR Abstract This paper demonstrates the first side-channel attack on homomorphic encryption (HE), which allows computing on encrypted data. We reveal a power-based side-channel leakage of Microsoft’s Simple Encrypted Arithmetic Library (SEAL) that implements the Brakerski/Fan-Vercauteren (BFV) protocol. Our proposed attack targets the discrete Gaussian sampling in the SEAL’s encryption phase and can extract the entire message with a single power measurement. Our attack works by (1) identifying each coefficient index being sampled, (2) extracting the sign value of the coefficients from control-flow variations, (3) recovering the coefficients with a high probability from data-flow variations, and (4) using a Blockwise Korkine-Zolotarev (BKZ) algorithm to efficiently explore and estimate the remaining search space. Using real power measurements, the results on a RISC-V FPGA implementation of the Microsoft SEAL show that the proposed attack can reduce the plaintext encryption security level from 2^{128} to 2^{4.4}. Therefore, as HE gears toward real-world applications, such attacks and related defenses should be considered. |
16:56 CET | 23.4.5 | Q&A SESSION Authors: Begul Bilgin1 and Maria Mushtaq2 1Rambus Cryptography Research, NL; 2Telecom Paristech, FR Abstract Questions and answers with the authors |
C.1 Closing
Add this session to my calendar
Date: Wednesday, 23 March 2022
Time: 18:00 CET - 19:00 CET
Session chair:
Cristiana Bolchini, Politecnico di Milano, IT
Session co-chair:
Ingrid Verbauwhede, KU Leuven, BE
Time | Label | Presentation Title Authors |
---|---|---|
18:00 CET | C.1.1 | CLOSING Speaker: Cristiana Bolchini, Politecnico di Milano, IT Abstract Closing session |
18:30 CET | C.1.2 | AWARDS Speakers: Ingrid Verbauwhede1, Jan Madsen2, Antonio Miele3 and Ingrid Verbauwhede1 1KU Leuven - COSIC, BE; 2TU Denmark, DK; 3Politecnico di Milano, IT Abstract Award session Jan Madsen: EDAA Dissertation Awards Antonio Miele: Best IP Award |
18:55 CET | C.1.3 | SAVE THE DATE - DATE 2023 Speakers: Ian O'Connor1 and Robert Wille2 1Lyon Institute of Nanotechnology, FR; 2Johannes Kepler University Linz, AT Abstract See you at DATE 2023! |