Introduction
Additive Manufacturing (AM), or 3D printing, has become a staple technology for engineering development. It is used in major industries, such as aerospace and automotive, for both rapid prototyping and final production. In addition, many hobbyists enjoy modifying and printing creative AM designs. More recently, the Covid-19 pandemic saw supply chain issues and a huge increase in personal protective equipment demand. 3D printed masks and face-shields were used in the interim as hospitals struggled to protect medical staff from disease [1]. With boundless use cases coming to fruition in recent years, the convenience and flexibility of this technology has led to an exponentially increasing number of computer-aided design (CAD) models stored in online databases, and CAD software programs have evolved to provide advanced capabilities such as creating complex shapes, developing assemblies of thousands of parts, working collaboratively on design files in the cloud, and conducting analysis (such as stress or heat transfer analyses) directly from the CAD interface.
Despite the widespread use of AM, there is a critical gap in the ability to search for a specific geometry within a database of design files. Traditional search engines use file metadata and user-inputted text-based labels. Many text-based models, however, are either difficult to search for, poorly labeled, or ambiguous. For example, common tools such as fasteners have slight variations in diameter, head type, threading, and length, leading to thousands of possible variations that require a very specific query and a judiciously labeled database to search for. Furthermore, just searching “bolt”, for example, may be the label for bolt fasteners, lightning bolts, the Pixar character named Bolt, or the Yamaha Bolt motorcycle. With such ambiguity in text labels, it is difficult to refine a search and find related images to a user’s desired model.
In addition to limitations of poor text labels for generic searches, the lack of native search functionality limits development of new design methodologies such as topology optimization (TO) algorithms to develop thousands of design variations based on the applied constraints [2], [3], [4]. In the absence of a robust search function, designers are unable to quickly pull the files with desired geometries and features. Both manual tagging and automated tagging via machine learning and statistical methods have been attempted in the realm of 3D model information retrieval. However, these techniques can be either labor- or computation-intensive, and they are unable to sufficiently address the needs of modern design capabilities. Geometry-based search and retrieval capabilities can help in selecting one or even a small number of files containing a specific geometry. Developing robust search capabilities has the potential to transform an additive manufacturing cyber-physical system (CPS) into an automated service provider industry [5], where online platforms host large CAD datasets that can be searched and the files can be directly sent to 3D printers for manufacturing in-house or by third parties.
Some recent works have developed methods to refine AM search by extracting information from 3D shape geometries, but with limited scope and success. Funkhouser et. al. [6] uses spherical harmonics to perform 3D search, and FabSearch [7] expands that approach to find manufacturing service providers who fabricated similar parts [7]. A newer approach named Fourier Fingerprint Search (FFS) projects objects into 2D slices, then obtains fingerprints by using the relative frequency maximums after a fast fourier transform (FFT) [8]. These preliminary works require a 3D model in order to search for exact or similar models; therefore, these methodologies have limited scope in application. Finally, some recent machine learning works map real world images to 3D models, but employ generic images to do the mapping and do not explore the discrimination required of exact match 3D model retrieval.
The scope of 3D search needs to be broadened to include a shape descriptor that encompasses all components of the additive manufacturing cyber-physical systems. First, searching for 3D models based on real-world objects will prevent redesigns and shorten development times. Traditional 3D scanners are a growing industry, but they require special equipment, manual editing, and costs that are prohibitive to many smaller-scale applications. Indeed, there is a need to use more accessible techniques, such as searching for a 3D-model based on pictures taken from a smartphone as the search input. This would allow a user to retrieve their 3D model in seconds. Second, search for 3D models based on GCode machine toolpath instructions is needed for manufacturers that change the 3D printer or print materials. Without the ability to retrieve the original 3D model, the manufacturer has no efficient way of generating GCode files for different machines. Third, search based on 3D models offers a broad range of cyber-physical applications, as discussed in the previous paragraphs.
To address the limitation of the state of the art and exploit the opportunities of machine learning in additive manufacturing, in this work we develop a universal search engine for 3D shapes called Coeus. In our methodology, we design and evaluate new shape descriptors that can be used across different types of input media, which span the entire adaptive manufacturing cyber-physical system process. Our contributions in this work can be summarized as follows:
Design and evaluation of new methods to gather input data from stereolithography (STL), GCode, and real-world objects,
Development of a novel silhouette generator for 3D shapes to enable object-based search;
Design of a robust search methodology based on silhouettes and depth-based images of 3D objects;
Performance acceleration and parallelization for converting GCode toolpaths to STL format.
Roadmap: The rest of the paper is organized as follows. Section II provides the background on the additive manufacturing process, as well as an overview of previously proposed 3D shape descriptors. Section III outlines the 3D search framework developed, and Section IV provides an in-depth look at the methodology used. In Section V, the experimental results and analysis are presented, and Section VI compares our work to prior research efforts. Finally, Section VII provides our concluding remarks.
Preliminaries
A. Additive Manufacturing File Formats
In the following paragraphs, we survey different file formats for 3D objects. This discussion will inform the need for a universal search capability across different file representations.
1) Point Cloud and VOX Files
Point Clouds are a series of points in 3D space, and is considered the simplest representation of a 3D object. For example, OBJ, PLY, and XYZ are common file formats for point cloud representations. The VOX file format is similar to a point cloud, except each point is on a discrete grid. In this case, each cube on the grid is referred to as a “voxel”. Point clouds are often used as the output to 3D scanning or neural network 3D object generation. While both the point cloud and voxel representations are beneficial for visualizing 3D models, they still represent a set of discrete (unconnected) points in the 3D space, and thus cannot be manufactured directly. Typically, these representations are converted to some other format, such as STL.
2) STL Files
STereoLithography (STL) is a file format used to describe the surface geometry of an object. In fact, the US Library of Congress describes the STL file as the de facto standard for 3D printing and rapid prototyping [9]. Even though STL files do not describe color or texture, STL is often used in advanced AM for portability across different CAD software libraries, where ultimately the material information and exact dimensions are added in. Internally, the STL file describes a surface as a set of interconnected polygons (typically small triangles). The number of polygons can be adjusted for proper resolution, so that more polygons allow for a smoother surface (at the expense of a larger file size). Today, most STL files are in binary representations, although ASCII variants are supported by many applications.
3) GCode Files
GCode files provide the 3D printer with step-by-step instructions for motor and nozzle control. They are most often generated from an STL file input by a printer-specific software. In particular, the process of converting an STL file to GCode is called slicing, as the model is divided into thin parallel layers for the printer to print one by one. This printer-specific toolpath code is the file that is needed for the manufacturing process to print a user’s 3D object.
B. Search Methodologies for Additive Manufacturing
1) Text Labels
The most popular 3D search engines today rely on text-based search, where models are labeled with text fields (e.g., keywords). Text-based search engines include Yeggi [10] and STLFinder [11], which are dubbed the “Googles” of 3D search [12], [13]. While these search engines are popular, they still face the issues of inaccurate labeling, ambiguity, and narrow user input, as already discussed in the introduction.
2) 3D Shape Descriptors
Several ways of describing 3D objects have been attempted in the context of 3D search. Some earlier 3D search methods investigated the extraction of basic features about objects, such as volume [14], surface area [14], bounding boxes [15], [16], and edge paths [16], which either incur significant information loss or are inefficient to search on. Therefore, better ways to describe 3D shapes were later developed. Here we provide brief descriptions into these 3D shape descriptors.
a: Spherical Harmonics
Spherical harmonics were first used to describe 3D models by Funkhouser et. al. [6], [17]. In their approach, each image is voxelized into 3D pixels aligned on an orthogonal grid. Then, by using a spherical coordinate system, the image is divided into rotational invariant rings. Harmonic functions are defined for these rings, and ultimately the euclidean distance between harmonics is compared to enable k-nearest neighbor search. Given that this method has rotational invariance, it seemed as a suitable candidate for 3D model search. However, information is lost in voxelization, and more accurate descriptors were later developed for 3D model information retrieval. This descriptor also assumes knowledge of the entire 3D model, preventing images of objects from being searched. Nevertheless, spherical harmonics is still used in many applications, including medical imaging [18] and 3D-model based neural networks [19], [20], [21].
b: Zernike Moments
Zernike moments can be used to describe both images [22], [23], [24] and 3D objects [25]. Briefly, Zernike moments are projections into an orthogonal basis in the polar or spherical coordinates, and use of these coordinate systems makes the image rotationally invariant [26]. Compared to spherical harmonics, these moments are more difficult to compute and suffer from discretization errors; however, their rotational properties and moderate robustness to changes in shape make them one of the most popular shape descriptors available for shape based matching. Zernike descriptors are considered a region-based descriptor since they use information from the entire shape [22], [27], [28].
c: Simple Fourier
In a simple Fourier descriptor, 3D objects are converted in 2D images and a 2D Fourier transform is performed on each image. Often the signature is computed by calculating the centroid, then measuring the distance between the boundary points and the centroid. This is done in the polar domain and is repeated for different Fourier coefficients. Simple Fourier descriptors are easy to compute, capture both global and local features and are insensitive to noise. They are considered a contour based shape descriptor since the descriptor depends on the contour (or boundaries) of the object [7], [14], [22], [29].
d: Fourier Fingerprint Descriptor
In contrast to the centroid-based method of the Simple Fourier descriptor, the Fourier Fingerprint Descriptor finds uses a local maxima filter to gather a set of points [8]. The first point acts as an anchor point, and the distance is measured to the remaining “target” points. The exact number of target points is configurable as the “fan-out”, with larger fan-out values making the signatures more unique, but harder to match. Finally a locally sensitive hash is applied to anchor target pairs, generating unique signatures for each 3D part.
e: Other Descriptors
Other shape descriptors include ones based on an object’s convexity [30], shape histograms [31], cords [15], symmetry descriptors [32] and statistical moments [15]. These descriptors have lower information retrieval accuracy than the descriptors discussed earlier, and to our knowledge have not been used for view-based 3D model search.
3) From 3D to 2D: Getting Images
Many works in the area of 3D search convert the CAD models into 2D images for easier processing. Some of these methods slice the object and are reliant on 3D geometries. Others are view-based, where objects are viewed and “photographed” from different angles. These are illustrated in Figure 2.
Adaptive Manufacturing process chain including both the design and manufacturing phases. Our work introduces an all encompassing search engine, which can retrieve STL files based on all downstream deliverables, including the STL models themselves, printer specific GCode files and already manufactured parts as real-world objects.
Retrieving Images: Three alternative methods of retrieving 2D images from 3D shape descriptors: (a) Lightfield slices (b) Fine grained and (c) Fourier fingerprint slice.
a: View-Based Methods: Lightfield Slices
The lightfield descriptor works typically use the silhouettes from different viewing angles to convert a 3D model into the 2D space. A dodecahedron with twenty vertices is used to generate ten silhouettes, since silhouettes are symmetric to 180 degree rotations. To get best results, a series of dodecahedrons are used; for example, in Chen et al. ten dodecahedrons were used [22]. We also adopt this method for our experiments, except we use a slightly smaller subset of nine dodecahedrons.
b: View Based Methods: Fine Grained
Several alternative methods of viewing objects from different angles have also been presented in the literature. In this case, uniform sample points are computed based on the intersection of concentric circles centered on the x and y axes. These methods provide more fine grained control into the number of images, but lack the uniformity of the lightfield slicer [33].
c: 3D Geometry Slicer: Fourier Fingerprint Slices
In Mouris et al. [8], 3D objects are sliced along 2D planes, and the 3D slices are then projected onto those planes. The 2D planes are rotated in 45 degree increments, resulting in far fewer projections than the lightfield descriptor. However, since the 3D object is sliced, this method is more suitable for digital files, and cannot be used on real-world objects in a non-destructive manner.
4) Generative Adversarial Networks for 2.5D Images
There is a number of recent works that utilize machine learning to develop depth-based images from a 2D photo. These techniques often rely on the use of generative adversarial networks (GANs), where one generator neural network tries to create a 2.5D image, and a separate discriminator network provides feedback on whether the depth image is realistic. The 2.5D images are able to capture 3D information about real-world objects, making them the baseline for our methodology in this work [34].
C. Content Based Image Retrieval (CBIR) Methods
While 3D search has limited focus, 2D content based image retrieval has been studied extensively in the past decade. Unfortunately, many of the techniques use color or texture descriptors, which are not as helpful for 3D model search. Figure 3 shows what CBIR search looks like using Google images. Here we provide a brief overview of the state-of-the-art work in this field.
Traditional CBIR methods: Modern search techniques rely more heavily on text based labels, followed by color and texture information. For 3D part search across different input media, color and texture can be misleading. Here are two examples from Google image-based search showing how color and texture can hurt search performance.
1) Color Based Descriptors
There are several color based techniques used in modern CBIR systems [35], [36]. Color moments [37], [38] of mean, variance, and skewness across the color channels can be used to narrow the image search space, but do not offer fine-grained discretion. Comparing color histograms [38] of each channel provides greater discretion, but spatial elements of the photo are not taken into account. Color coherence vectors [39] enhance the histograms by taking into account other colors in the immediate region. Finally, the color correlellogram [40] maps the spatial correlation between colors, adding a high degree of discrimination but also increasing the size of the query index.
When it comes to additive manufacturing part search, color descriptors do not provide the same level of benefit as they do to CBIR. First, additive manufacturing part search has less color discrimination for monochrome 3D part search than image search where images have unique features and backgrounds. Second, objects can be printed in any color, so this descriptor would be irrelevant for many 3D part search applications.
2) Texture Based Descriptors
Texture based features try to describe how an object would “feel” to a human observer. For pictures, they are indicative of the object being described. One common feature set used are Tamara features [41], of which coarseness, contrast and directionality of edges are the most important. Wold features [42] are also used in CBIR to determine the periodic and random components of a texture pattern; Simultaneous auto-regressive (SAR) models [43] use Markov random fields to achieve similar goals. Finally, the Gabor filters and wavelet transforms decompose the image into frequency components, as in the frequency domain, it is easier to pull information about an object’s texture.
While texture is great for 2D image CBIR, the textures of 3D models change based on input medium (i.e., GCode, STL or real world images) and for subcategories within, such as STL resolution and printer settings. Therefore, instead of using texture as a feature set, Coeus looks to eliminate texture-based features through 2.5D object creation.
3) Shape Based Descriptors
Shape based descriptors [44] provide some information in practical CBIR, but real world CBIR is unable to segment and isolate image objects for shape features extraction [45]. Segmentation based neural networks have made some progress in retrieving objects in images, but there is still room for growth in this area [46], [47]. For 3D object search, the object geometry remains critical for finding similar objects. The shape descriptors discussed in this Section are derived from 2D image descriptors, namely Spherical Harmonics [48], Zernike Moments [49], and Fourier Descriptors [50].
D. Reverse Engineering Additive Manufacturing
Reverse engineering of real-world objects into 3D models is a widely used technique. However, this approach is not perfect and often requires manual intervention to polish the retrieved model. Here, we briefly discuss professional 3D scanners, which require special equipment, and neural network based reverse engineering, which is a cost effective approach, but yields lower quality results.
1) Professional 3D Scanners
Professional 3D scanners come in many shapes and sizes depending on the dimensions of the target object and target application. These scanners use infrared depth sensors to capture information about the real-world object from many different angles, which can be converted into a baseline 3D model. These baselines give a general idea about the input shape, however, additional manual labor is required before the final CAD file is ready. Specifically, a skilled CAD engineer is needed to align the viewing angles, remove any defects, and restore the structural integrity of the 3D object. This can take several hours even for simple parts [51].
2) Generative Adversarial Networks for 3D Models
Recent work has focused on taking pictures of real-world objects and generating 3D models using machine learning. This method of 3D model generation opens exciting research directions from a machine learning perspective, but the current performance is lacking and the scope is limited. First, many models are specific to certain classes of inputs, limiting the usefulness of the neural network. For example, [20] requires the user to input whether the picture is of a chair, table or couch. Several recent models lift this requirement, but performance is still limited to the data the network was trained on. Second, the pointcloud output still has many perturbations and imperfections, making it difficult to use for 3D search; thus, significant advancements in 3D model generation are needed before it can be used reliably for 3D model search. Examples of the limitations of these methods are shown in Figure 4.
Genre Evaluation: Here we present examples of the Genre code. The bearing is converted into a chair, since the neural network was trained only on furniture. Even the chair output, for which the model is trained, is far from perfect. The voxelized chair looks much different than the query image, filling in the armrests and adding more cushion. In addition, the 3D output has lots of deformities. Overall, these results are too restrictive and too noisy to use in 3D search.
Framework Overview
An overview of our Coeus search engine is illustrated in Figure 5. First, pictures of the query object across different angles are obtained from the various input media. Next, the MarrNet GAN [34] is adopted to generate 2.5D images from each angle. Finally, the images undergo a Fourier fingerprint transformation to generate a rotation invariant depth-based image that is used for search.
Coeus System Overview: Coeus allows for several different inputs, as the base shape descriptor relies on 3D information extracted from 2D photographs. These raw object views are first converted into 2.5D depth based images. Afterwards, a 2D fourier transform is performed, local maxima “peaks” corresponding to shape inflection points are detected, and SHA-1 based signature is generated based on the distance between the peaks.
A. Picture Generation
The first part of the process entails generating images of the object from various angles. This step is essential for Coeus, as it allows us to grab inputs from different media (i.e., real-world object, STL, GCode) and convert them into a common search query.
1) Real World Objects
In order to retrieve a 3D object from a database based on real world parts, it is first required to capture some data about the query object. In effect, the goal is to take pictures of the object from different view angles and pass them to our 3D search engine that can retrieve similar parts. In both cases, the resulting 2D images of real world objects (direct views or video frames) are transformed by our Coeus search engine into the Fourier domain to match an enrolled 3D shape in the database.
2) STL Files
Searching for similar objects given an STL file offers significant benefits, both for a user search engine to find similar parts given an STL input query, and for improving search results with topological-based feedback. In our methodology, we have developed an OpenSCAD program that analyzes the STL input query and can quickly generate multiple projections from different angles by rotating the STL object. In this case, we adopt a lightfield slicer to collect 18 images by rotating the dodecahedron nine times across two axes. Note that since collected images are rotationally invariant, we can safely discard rotations along the third axis.
3) GCode Toolpaths
GCode files are the machine-specific instructions used to manufacture 3D printed parts. In practice, it is desirable to also be able to find a model based on a GCode toolpath, for example when a manufacturer wants to upgrade or change printers. As it has been reported, GCode can be reverse engineered to recreate an STL model [52]; however, the process of analyzing GCode and reversing it back to an STL file is inherently lossy, so only an approximate 3D model can be recovered. Nevertheless, since our shape-based search methodology can perform exact match information retrieval, we are able to find the original model in a database based on the reverse engineered STL file. Notably, Coeus parallelizes the transformation of a given GCode to its corresponding STL script, to further accelerate the search operation. Finally, after the STL is reconstructed from the GCode, Coeus uses the STL methodology to generate projections, as described above.
B. 3D Descriptor Image Generation
After Coeus collects a set of pictures from the query object, they need to be transformed into usable shape descriptors. This step is essential in our methodology since it simplifies the problem and reduces the size of the input space into black and white pixels. In other words, the differences in print material, color, lighting, glare, and picture angles that can cause lots of variation in the images, can be removed without losing the shape information when a silhouette or 2.5 image transform is completed. By removing this variance, Coeus is able to search for a model regardless of the input medium type or how it was printed.
Coeus employs either the object’s silhouette or its 2.5D depth-based images to generate unique shape descriptors. The key benefit for using silhouettes is they are 180 degree rotationally invariant across all axes. As an example, assume the target object is a coffee cup; then, if we compare a baseline projection from the top to a projection rotated 180 degrees around the vertical axis, the silhouette information will stay the same. Therefore, using silhouettes as descriptors has the potential to store only half of the signatures in the database. However, as shown in our experimental results, the accuracy loss from not having depth information (i.e., 2.5D images) far outweighs any potential memory savings.
1) Image Generation Methods
To apply the 2.5D transform on a given 2D image, we adopt the pretrained MarrNet [34] GAN. The results of the transform have high accuracy as to what we would expect for output images. While MarrNet also comes with a silhouette generator, the Coeus search engine is designed to improve on both the time and quality of the silhouette generation. Therefore, in this work, we have designed a bespoke integrated silhouette generator, developed in MATLAB.
C. Signature Generation
To generate searchable signatures for 3D shapes, Coeus adopts a Fourier fingerprint scheme, due to its robustness to small changes and rotational invariance. In the original Fourier fingerprint scheme [8], the fingerprints were taken based on a binary point cloud, comprising 0’s and 1’s in a 3D grid. Conversely, Coeus challenges the use of a binary point cloud, as it loses any depth information in the input projections. Instead, Coeus computes the 2D Fourier transform by using floating point representations of pixels ranging from 0.0 all the way to 1.0. Our experimental evaluation shows that this approach preserves all the depth information in a signature, leading to more robust and accurate search functionality.
Implementation Details
Coeus functions in several stages. The first is generating the input, where Coeus can perform GCode reversion, as well as consume STL or image inputs. The second step is image extraction. where Coeus generates depth-based and silhouette images utilizing state-of-the-art generative adversarial networks (GANs) and signal processing methods, respectively. Finally, we provide a modified Fourier Fingerprint that can handle these silhouette and depth-based images.
A. Input Formats for Coeus Search
All input data into the Coeus backend are in the form of images from various viewpoints. Based on the different raw input media Coeus can accept, these viewpoints are generated in different ways. Here we discuss three alternative raw inputs.
1) STL Based Methods
STL files are well supported for 3D printing, and are easy to use for generating different viewpoints of an object. Coeus adopts the OpenSCAD rendering engine to rotate the object and take “snapshots” from different viewing angles. To decide which viewing angles are optimal, Coeus implements a light field slicer variant, where viewpoints were placed on all 20 vertices of a regular dodecahedron. This process is repeated 9 times, rotating {0, 30, 60} degrees across the
Algorithm 1 Database Registry of STLs
for
for
for
if Silhouette Descriptor then
else if Depth Descriptor then
end if
end for
end for
end for
2) Camera Based Methods
Input images of a real-world part taken by a user would need to capture the object from different angles. This could be done by taking smartphone pictures of an object from various angles, or by taking a video of the object rotating and extracting the individual frames. In our approach, we simulate this process by taking images from the STL files. For this input method, we continue to use our light field slicer variant to capture the object at different angles, but use a random subset of images as input to our search engine. By using this simulation method, we were able to search on all objects in our database.
3) GCode Based Methods
Since GCode comprises many motor commands used for printing, one way to approximate the original STL is to simulate the printing in a virtual 3D plane. Once a 3D object is reconstructed as the union of virtual filament segments inside an.scad file, it can be converted into an STL [52]. However, rendering large or complicated objects can take a considerable amount of time, as this grows with model size.
To optimize this process, we developed a parallel GCode to SCAD compiler that produces multiple output files, so their union reconstructs the original object. We remark that the number of lines per SCAD file needs to be judiciously configured: if there are too many output files, this will increase the time Coeus requires to compute the union of all fragments during the last step. Likewise, each SCAD file should include a sufficient level of details to allow the generation of a high resolution STL. However, if the SCAD files are too large, then the rendering time grows as well. Our analysis shows that SCAD files of about five hundred lines offer an optimal tradeoff across models of different complexities.
The next step renders each of the SCAD files independently across multiple CPU cores. The output of this step is one STL file per SCAD input. Each of these STL files needs to be combined by a union operation in the final step. This step also requires careful optimization, since merely loading an STL file into memory may take a considerable amount of time. Thus, we employ parallel reduction in this step, where one process thread combines ten STL files into a single output STL during each iteration of our algorithm.
B. Image Extraction
Once we have a set of viewpoint images for the query part, additional post-processing is required to generate a valid silhouette. Here we discuss two alternative methods.
1) GAN Based Approach
Neural networks are becoming increasingly powerful, and can be exploited for advanced image processing. In this case, it is possible to adopt the MarrNet generative adversarial network (GAN) to generate silhouettes and 2.5D depth images based on a 2D picture of an object. Notably, this neural network approach is beneficial, as it further allows automated removal of busy backgrounds in order to isolate the target object. One key drawback, however, is the computational overhead, as GANs are slower than a direct signal processing approach.
Coeus therefore employs the first half of the pre-trained neural network of MarrNet to output the 2.5D images. The second half of that neural network, which performs 3D object reconstruction, was removed to lower the latency of image processing.
2) Signal Processing Based Approach
A signal processing algorithm offers significant benefits and high accuracy in silhouette extraction for pictures taken with a clean background. For our analysis, we have designed a silhouette extraction algorithm to generate the silhouettes (Algorithm 2). The algorithm creates two images: a raw thresholded image based on the RGB values, and a filled silhouette image. The threshold helps detect any holes in the object, that are otherwise filled in by the
Algorithm 2 shapeExtractionSil(img_{input})
: Silhouettes
C. Database Enrollment
After Coeus captures the different input media and extracts the geometric properties, the next step is to generate signatures and enroll the model in the database. These procedures are dependent on type of input enrollment.
1) Silhouette Enrollment
For the silhouette based approach, the processed images can be converted into a numpy array and be directly converted into a Fourier Fingerprint signature [8]. Here, the 2D Fourier transform is taken on the 2D binarized image and local maxima (peaks) are identified. Then, searchable signatures are generated by hashing the euclidean distance between peaks. In effect, this method creates signatures based on changes in frequency, or changes in edges in a silhouette diagram. Notably, this fingerprinting process compresses the silhouette memory footprint from about 1MB down to 200 bytes (comprising 10 fingerprints each of size 20 bytes).
2) Depth Image Enrollment
For depth-based images generation, the Coeus process adopts the GAN from MarrNet [34] to generate a high quality 2.5D image output; this is illustrated in Algorithm 3. In this case, after the depth image is obtained, Coeus employs a more elaborate methodology, since the original Fourier fingerprint descriptor works only on binarized projections. Indeed, 2.5D images are hard to project as they only contain information pertaining to one viewing angle (i.e., they cannot be rotated to generate rotational slices [8]). In theory, one could slice the object into two rotational slices from opposite angles at 180 degree rotations and compare the projections; however, for human inputted pictures (e.g., smartphone photos of an object), slight variations in camera angles would cause a misalignment of slices leading to poor search results.
Algorithm 3 shapeExtractionDep(img_{input})
: Depth
To address this challenge, we revisit the Coeus fourier fingerprint method. Specifically, Coeus first performs the 2D Fourier transform on a L1-normalized depth-based image, with pixel values ranging from 0.0 to 1.0. The CoeusFFS function then uses these floating point normalized inputs rather than binary 3D model slices or silhouettes. After the Fourier transform is computed, the process for signature generation continues in the same way as the silhouette approach. The resulting fingerprint size for depth-based models is also 200 bytes per image, as the fingerprinting parameters (e.g., fan-out [8]) for both models remains the same.
A significant benefit of 2.5D images is that the differences in depth contrast will be detected as a local maxima, resulting in a fingerprint that essentially encodes 3-dimensional data. With this extra information, the system is better able to retrieve a 3D model.
Experimental Evaluation
A. A Dataset of Popular Models
For our experiments, we employ a custom dataset comprising 136 of the most popular 3D models on the online platform “Thingiverse” to test Coeus on a wide range of geometries. The dataset is composed of the following nine classes: boats, chess-pieces, coasters, cups, dice, pen holders, phone cases, vases, and wallets. In particular, our dataset contains a wide range of different 3D model complexities, from Star Wars themed chess pieces and cups, to the standard chess pieces and coffee mugs shapes. Conversely, some of the classes, such as wallets and phone cases, contain very similar models, making it more challenging to perform exact matching for these objects. Given these unique characteristics, this dataset provides a good variety of geometries and classes and helps draw useful insights into search performance. We remark that while this dataset is optimized for evaluating exact matching, it is not designed for classification (which was not an objective in this case).
B. Exact Matching Results
We evaluated the exact matching accuracy for both the silhouette and 2.5D depth-based methods, attempting to recover an STL model based on a randomly selected subset of object images. Our experimental results are presented in Figure 6.
Coeus Exact Match Results: Here we present our top-1 exact matching results for two different approaches. For silhouette based search, we observe that certain classes have low accuracy despite how many views are used in the query. We observe that silhouettes are sub-optimal shape descriptors, especially for hollow objects where depth information is critical. Conversely, using 2.5D view based images, we were able to achieve high accuracy even for a limited number of view angles in a query.
1) Retrieval Rate
For 2.5D depth images, Coeus scored perfect accuracy when more than twenty views of the object were used as part of the query. Ideally, this can be translated in a short video of an object rotating. This 100% exact match accuracy indicates that the depth-based images combined with the Fourier fingerprint signatures were fully able to capture the 3D information of an object.
While Coeus can perform perfect exact-match retrieval for twenty or more 2.5 depth images, the results become sub-optimal for certain classes when a smaller number of views is used. We further analyzed which queries did not exhibit perfect exact matching rates when too few images are used for search. Our analysis indicates that the queries that did not perform well were those that did not have salient features visible from the viewing angles selected. As viewing angles are randomly selected in our methodology, some images collected can be sub-optimal when the number of views is reduced. In effect, reducing the number of viewing angles prevents Coeus from distinguishing the differences in geometry between two similar object geometries, as shown in Figure 7.
Poor Viewing Angles: A random subset of viewing angles can lead to poor exact matching in some of the cases. Here we show an example of two different 3D coaster models. In this example, it is easy to see how these two coaster objects can get confused when looking at them from the bottom. In this type of cases, more than one viewing angle is needed to determine the difference between classes.
2) Silhouettes Vs. 2.5 Depth-Based Images
While Coeus can capture an object’s geometry with twenty 2.5D images, silhouette images flatten important 3D information. To counter this, a larger number of silhouette images are needed for perfect object matching. Most classes could achieve 100% exact match accuracy with forty different viewing angles. Looking at the few objects that had difficulty with silhouette based retrieval, we observe that any information about the engravings was lost in the silhouette generation process, making it more challenging to classify these shapes. For objects with these subtle geometric features, the best method to capture the full geometry is to use a depth-based image model.
3) Performance Across Classes
Comparing how the different classes performed in exact matching queries, we are able to confirm that instances of sub-optimal search results can be attributed to missing viewing angles corresponding to salient features. The two classes that required a larger number of views to enable perfect exact-match retrieval were dice and coasters. These objects have similar 3D geometries but different engravings, making them harder to distinguish; likewise, these engravings may not be visible at different viewing angles. An example of a misidentified signature is shown in Figure 7. We observe that even from the raw STL-based image, it is difficult to identify the differences between the two coasters, making it a very challenging task for an information retrieval system. Despite the similarity of these shapes, when Coeus was provided with sufficient viewing angles it was able to detect the engravings and achieve perfect information retrieval. Overall, phone cases performed the best across all of the classes, overcoming the fact that phone cases have a very similar box shape. Here, the unique camera opening and the auxiliary port positions on the phone case provided Coeus with very distinct signatures. Specifically, since these features are essentially holes on the 3D object, they are visible at most viewing angles. This allows perfect exact matching even with a limited number of input queries.
C. Categorical Results
Categorical matching results are dependent on the class, as shown by the PR curves in Figure 8. Moreover, some examples of the corresponding queries are presented in Figure 9. As we observe, classes with uniform shapes (such as, chess pieces, coasters, and phonecases) perform well for categorical matching, as shown in rows A and C of Figure 9. Classes and queries with more diverse shapes (such as, boats, pen-holders, and vases) offer lower matching performance with other objects in the class, due to the difference in shapes and discrimination of the dataset.
Coeus Categorical Match Results: Here we show the classification precision-recall accurate. Each query used using 5 random views of the object. Classes with higher similarity between models performed better than the classes with more unique shapes.
Coeus Sample Queries: These queries showcase a sample of the best and worst-case matching results. Lines A and C demonstrate best-case shape-based pattern matching for silhouette chess pieces and depth-based phone cases. These classes have distinct shapes that are correlated to each other. On line B, we show the worst-case silhouette- based search for vases; this query had relatively few signature matches with other objects in the database. Line D shows a depth-based search of a uniquely shaped phone case; due to its difference in shape from other phone cases, it failed to locate other phone case objects in its search.
For categorical matching using this dataset, silhouettes performed better than depth-based search. Based on the number of matching signatures and the qualitative results, this observation is attributed to the higher amount of discrimination in depth-based search. Recall that the similarity metric used for retrieval is based on number of matching signatures. Therefore, for depth-based search, differences in the 3D models are greater and lead to an increased number of different signatures.
Related Works
View-based 3D model information retrieval has focused on four main areas: query by sketch, query by silhouette, query by depth, and ML-based solutions. Shape descriptors for sketches. We also cover one view selection [23] and one CBIR query-by-approximate shape method [73]. Table 1 summarizes the features and performance of related works. We do not consider other content-based image retrieval (CBIR) methods, as their reliance on color and texture make them ill-suited for 3D search; we discuss details on these features in Section II.
A. Query by Sketch
There are many works leveraging contour based shape descriptors [6], [53], [54], [55], [56] for sketch based retrieval, which maps 2D human drawn contour sketches to 3D models. Loeffler [53] and Li et al. [55] use binary images similar to our silhouette approach, whereas Funkhouser et al. [6] and Pu and Ramani [54] utilize spherical harmonics on 2D contours. Eitz et al. [56] maps 3D images to line drawings, and clusters the line drawings based on a Gabor filter. Sketch based 3D model retrieval methods must deal with varying user inputs, but the lack of discrimination of human sketches makes exact matching impractical. These methods are also impractical for large datasets, since human sketches are not a robust test for object discrimination.
Similarly, Chen et. al. [22], [57] utilizes user drawn silhouettes, with search is based on a combination of Zernike moments and Fourier coefficients on its silhouettes. However, their experiments featured user silhouette drawings, which limits the discrimination and scalability for exact matching.
Recent machine learning works [58], [59], [60], [61] perform sketch based 3D model retrieval using convolution neural networks, for both sketch and feature extraction as well as clustering, to correlate sketches to 3D models. Nevertheless, this approach suffers from two major drawbacks: First, the application focuses on categorical matches and the use of hand sketches limits discrimination and exact matching. Second, both the feature extraction and clustering need to be trained, unlike Coeus. Overall, it is unclear how well these works they can handle objects outside of the training set.
B. Query by Silhouette
For silhouette based shape descriptors, the works by Macrini et al. [62], [74], [75], Denton et al. [63] and Cyr and Kimia [64] all use shock based graphs of silhouette images to perform 2D to 3D search [76], [77]. All of these works employ very small datasets of around twenty objects, suggesting they do not scale well to larger datasets, with Cyr et al. taking 5 minutes to perform a single query [64]. Moreover, they perform search utilizing an increased number of different object views. For example, Macrini et al. [74] use one hundred twenty-eight different silhouettes, and observe sharp drops in accuracy as soon as the query space decreases below thirty-two views. Given the information lost in silhouette generation, this approach does not scale well for bigger datasets of similar objects. Conversely, the Coeus 2.5D Fourier Fingerprint preserves key 3D features of the object, allowing for more robust search with higher discrimination and scalability.
C. Query by Depth
Several works have taken advantage of using depth image based shape descriptors. Ohbuchi et al. [65] utilize Fourier coefficients for their depth images and develop the multiple orientations depth Fourier descriptor (MODFD), which is a 3D array of 2D polar Fourier coefficients across 42 depth-based images. In their experiments, this array contains 2688 (
Stavropoulos et al. [66] identifies salient features [78], [79] based on an object’s 2.5D depth-based image, detecting any sharp points of inflection. This approach employs a binary search-like approach to zero in on the ideal viewing angles of the 3D model for their shape descriptor. This approach limits its application to 3D files, as it is very challenging for real-world cameras to achieve the < 0.2 degree view point alignment required by their model. Coeus, on the other hand, does not have a strict viewing angle requirement. In their experiments with 400 models, they report only a moderate exact matching accuracy (with random rotations) of 85% and 96% for top-1 and top-20 results respectively.
D. ML-Based Queries
Several works [67], [68], [69], [80] perform 3D model retrieval by relying on neural networks to generate feature vectors. Specifcially, these works use a neural network to combine multiple views of an object, extract shape features, weight the feature vectors for saliency, and combine the views into a single metric. The techniques developed in [67], [68], [69], and [80] all require a uniform way to sample views of 3D objects and a training set of 3D models. Therefore, it would be difficult to adapt them to real-world objects.
Moreover, the machine learning shape-based feature descriptors in [70], [71], and [72] can search for 3D models based on 2D images. All these works map 2D pictures and images of 3D parts to feature vectors using separate neural networks, then map the 2D features to the 3D features. Zhou et al. [70] developed a specific translator module to perform this mapping, and incorporate the loss of the translator module to ground truth as part of their network. Nie et al. [71] leverage pose estimation of the 2D image to enable a search functionality, while Liu et al. [72] develop a semi-supervised technique to train the networks. The main drawback of these networks is the classification results of 2D images and 3D objects are implicitly mapped based on classes and have little focus on exact matching (unlike Coeus). The works by Zhou et al. and Nie et al. also rely on training with a labeled dataset to develop their networks, however it remains unclear how inputs out of the distribution of the training data will perform. Notably, Coeus does not require a training dataset for its descriptors. Finally, these networks are subject to adversarial attacks [81].
E. Other Works
For view selection, Ansary et al. [23], [24] develop a set of characteristic views, downselecting the 3D model views with the most information. This downselection process enables them to eliminate poor viewing angles and build a lightweight database. For this reason, Ansary et al. can high levels of scalability than Coeus. However, they did not develop any new shape descriptors in their works, but rather evaluated their characteristic view selection on several different non-Fourier shape descriptors, with Zernike moments performing the best. They also used raw photos of 3D models instead of silhouettes or depth-based images. Differences in print material and lighting will affect Zernike moments, unlike Coeus’ 2.5D depth-based images, and this makes it harder to translate such a system into a real-world application. They also require 320 views of an object to perform a query, as well as descriptors from 23 different views even after downselection. Such requirements make this approach impractical for real-world applications.
Finally, Deniziak and Michno [73], [82] apply query by approximate shape to CBIR. This method uses edge detection to build a decision tree graph. While their descriptor contains potentially useful information, the graph structure is not scalable to larger datasets with exact match criteria. In addition, since this work focuses on shape-based CBIR and not 3D model search, the challenges of view selection were not explored by the authors.
F. Timing Results
Finally, we provide a brief overview of our timing results, summarized in Tables 2 and 3. Our timing experiments for Coeus were obtained using an AWS p2.xlarge instance, with 4-CPU cores and a single NVIDIA K80 GPU. Only MarrNet (depth-based image generation) used the GPU for acceleration.
The timing profiles in Table 2, show silhouette extraction and signature generation both scale very well, with much of the latency attributed to the overheads with file and database I/O. We further compare these timings to related work in Table 3. We remark that only a few works are reporting timing figures, and they are recorded on vastly different hardware. Overall, we observe that the silhouette-based search of Coeus performs well compared to others, yet the depth-based search is slower due to the overheads introduced by MarrNet.
Conclusion
In this work, we present Coeus: a 3D search system that relies on enhanced 2D data and expands the scope of search to different stages of the adaptive manufacturing supply chain. As part of this search paradigm, we developed a new Silhouette Generator tool, and integrated neural networks aimed at generating enhanced 3D models into a versatile search framework. In our methodology, we extract the depth information from a 2D image, and use the Fourier Fingerprints to match and retrieve 3D geometries from a database. In addition, Coeus features a robust “GCode to STL” compiler with multi-processing capabilities that exploit parallelization, which enables our search engine to retrieve 3D objects based on their GCode toolpath representation. In our experimental results we demonstrate perfect accuracy for 3D model retrieval with a modest number of twenty 2D views of an object. Our findings open new research directions towards improving the depth extraction from 2.5 depth-based images and identifying new systematic ways to capture 2D view with optimal shape information about a 3D object.
ACKNOWLEDGMENT
This article is based upon work supported by the National Science Foundation under Grant No. 2234974. Any opinions, findings, and conclusions or recommendations expressed in this article are those of the authors and do not necessarily reflect the views of NSF.
Resources
The Coeus framework is open-source and available online at https://github.com/TrustworthyComputing/Coeus.