The SPPDG found itself in need of specialized electronic computer aided design (ECAD) tools to assist in the design and simulation of high performance non-silicon digital integrated circuits (ICs), and systems constructed with these specialized chips, in the early 1980's. There were few ECAD tools and NO design frameworks at that time, so it was necessary to develop the appropriate tools and supporting framework to assist in the design and implementation of high clock rate digital signal processors. The Mayo-developed tools became an integrated system called MagiCAD.
The SPPDG used MagiCAD almost exclusively for semicustom gallium arsenide (GaAs) IC design from the mid-1980's until 1998. By that time, simply maintaining the complex infrastructure of an ECAD system became a significant effort. Meanwhile most of the major commercial ECAD vendors had improved their tool suites significantly, adding flexibility and support for "deep sub-micron" features which were similar to the design issues in high speed GaAs IC design. Today Mayo has many different commercial ECAD tools for semicustom IC, printed wiring board (PWB), and multichip module (MCM) design available in-house.
The SPPDG developed and deployed a number of ECAD application programs and integration scripts to enhance the performance of the various design systems in use today. These tools range in complexity from simple scripts (e.g., generate piecewise linear SPICE models for pseudo-random bitstreams) to complete engineering tools based on new techniques and algorithms (e.g., new measurement de-embedding analysis tools).
Over a span of more than two decades, SPPDG created a number of electromagnetic modeling tools to help us understand and design special approaches for packaging of integrated circuits at the first and second levels (chip carriers and circuit boards respectively). These tools ranged from simple scripts to field solvers for controlled impedance transmission line structures based on the quasi-TEM assumptions, subsequently extended to the creation of a full-wave transmission line simulator based on the Finite Edge Element Method (FEEM), which is derived from vector finite element methods. This tool was shown to be about ten times faster than traditional Finite Element Method programs, making it possible to directly simulate all frequencies of interest, instead of relying upon numerical extrapolation techniques. Part of this work included the development of a new transmission line modeling framework and a simplified user interface for creating 2-D transmission line models and controlling several EM simulators. The new tool, dubbed TNT, streamlined the creation of transmission line cross section models for HSPICE or other system simulators. TNT is available in source code and executable forms at http://mmtl.sourceforge.net/, from where it has been downloaded more than 22,000 times over the past decade. Finally, we also developed tools to simulate noise on the power and ground planes of complex multi-layer printed wiring boards. Part of our work in power and ground modeling has been at the level of the individual IC. The core of modern deep-submicron ICs operate on nominal voltages around 1 Volt, and with edge dimensions as large as 20 x 20 mm, containing 200 million or more transistors, “ground bounce” and “rail collapse” issues can seriously degrade performance. We worked to understand and quantify these on-chip power distribution problems through design and laboratory testing of large CMOS test chips and through the development of analytical and empirical models that can be used in conjunction with commercial IC design tool suites. As indicated above, a number of these tools have been made available in the public domain through various open-source repositories.
Software For The Mining Of Very Large Data Sets (“Big Data”) Based On New Algorithms, Executed On Novel Computer Architectures
Throughout the past several decades, a variety of software tools and data storage environments have bee developed to allow searches of data sets of varying sizes; examples of such environments include Sybase, Oracle, SQL, MySQL and others, based on relational database technologies. These tools evolved to conduct specific operator-driven searches against one or more sets of spreadsheet-like tables, or data stored in true multidimensional databases.
However, over the past decade, the data sets themselves in many fields have exploded in size and complexity, to the point that single data sets of multiple terabyte or even petabyte sizes are increasingly common. Further, integration of many different data sources has also resulted in data that is not tabular in nature. Searching of such large and complex data sets stresses and exceeds the ability of the conventional relationally oriented tools. At the Mayo Clinic, where our clinical data sets are multiples of petabytes in size, and growing daily by terabytes per week, relational approaches have proven to be limited in capability and slow to execute complex searches.
Thus, since 2010, we have been exploring different data storage and search approach, based on the concept of abstract mathematical “graphs”, and with an algorithmic and software tool set referred to as “graph analytics”. In this context, a “graph” is composed of “nodes” connected by “edges”, where each node contains an element of data, and where each “edge” represents a relationship between data elements. Further, the graph edges are both labeled and directed. Labels give semantic meaning to the edges, differentiating relationships such as “uncle” and “nephew,” and direction is from subject to direct object (from English grammar). Relationships in the graph are described by the triples of “subject – predicate – object,” such as “onePatient is DiagnosedWith specificDisease” In turn, the direct object node can also be the subject for other relationships, such as “specificDisease is TreatedBy specificDrug” This construct is propagated throughout the data set until all “nodes” are connected to all other nodes by a multiplicity of directed edges. The subsequent image of such a structure looks like a “rats nest” or a “furball” in N dimensions, where N can be a very large number (see the next two figures). The red segmented line in the next figure depicts a random-appearing “walk” through the connected data, illustrating how a search for a particular set of information elements might be carried out using algorithms tuned for searches through an N-dimensional directed graph.
A complex ad hoc data query mapped onto a graph of a synthesized data schema, illustrating a notional search path of a single query through the graph.
The following chart depicts an N-dimensional directed graph, with an exploded view of a small section, illustrating the nodes and their connecting edges.
A synthesized graph, with one section “expanded” to show a detail of a small number of “nodes” and their connecting “edges”.
The advantage of a “graph” is that appropriate search algorithms tailored to the graph model of data storage are not limited to rectilinear searches, and in many cases can be configured to execute in parallel; an analogy would be a nest of leaf cutter ants, each of which brings back to the nest a single leaf, but a large number of ants are at work simultaneously (in parallel). We have identified more than ten such search algorithms (many of which cannot be run even conceptually on conventional tabular data) that are showing considerable promise for rapid searches of multi-Terabyte data sets of both the clinical nature and other types of data sets as well. The primary downside to this type of data structure is that it must be assembled initially from data items typically residing in relational databases; however, the construction of the graph need only be performed once (or only occasionally if the data set needs to be updated). Once the graph has been generated, an unlimited number of different searches can be conducted without any need to rebuild or modify the constructed graph.
The underlying hardware running these types of searches is critical. In our experience, and from actual test runs, conventional compute clusters and cloud deployments significantly limit the flexibility and throughput of the graph-oriented search algorithms. Thus the majority of our experimentation is conducted on a special supercomputer, the Cray/HPE Urika GX, tuned at both the hardware and software levels for graph analytic data sets and searches, which resides on the Mayo Clinic Rochester campus (see the next photo).
The Cray/HPE Urika-GX data mining supercomputer installed on the Mayo Clinic Campus in Rochester, MN. The two cabinets contain 48 compute nodes, 1,536 Intel Haswell cores, 12 TB DRAM, 200 TB of RAID and distributed storage, integrated via the Cray “Aries” supercomputer network. Analytic software stacks include the Cray Graph Engine, Spark and Hadoop.