Construction of a 3D Model Knowledge Base Based on Feature Description and Common Sense Fusion

Zhou, Pengbo; Zeng, Sheng

doi:10.3390/app13116595

Open AccessArticle

Construction of a 3D Model Knowledge Base Based on Feature Description and Common Sense Fusion

by

Pengbo Zhou

¹ and

Sheng Zeng

^2,*

¹

School of Arts and Communication, Beijing Normal University, Beijing 100875, China

²

School of Information Engineering, Ningxia University, Yinchuan 750014, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(11), 6595; https://doi.org/10.3390/app13116595

Submission received: 9 April 2023 / Revised: 25 May 2023 / Accepted: 26 May 2023 / Published: 29 May 2023

Download

Browse Figures

Versions Notes

Abstract

:

Three-dimensional models represent the shape and appearance of real-world objects in a virtual manner, enabling users to obtain a comprehensive and accurate understanding by observing their appearance from multiple perspectives. The semantic retrieval of 3D models is closer to human understanding, but semantic annotation for describing 3D models is difficult to automate, and it is still difficult to construct an easy-to-use 3D model knowledge base. This paper proposes a method for building a 3D model knowledge base to enhance the ability to intelligently manage and reuse 3D models. The sources of 3D model knowledge are obtained from two aspects: on the one hand, constructing mapping rules between the 3D model features and semantics, and on the other hand, extraction from a common sense database. Firstly, the viewpoint orientation is established, the semantic transformation rules of different feature values are established, and the representation degree of different features is divided to describe the degree of the contour approximating the regular shape under different perspectives through classification. An automatic output model semantic description of the contour is combined with spatial orientation. Then, a 3D model visual knowledge ontology is designed from top to bottom based on the upper ontology of the machine-readable comprehensive knowledge base and the relational structure of the ConceptNet ontology. Finally, using a weighted directed graph representation method with a sparse-matrix-integrated semantic dictionary as a carrier, an entity dictionary and a relational dictionary are established, covering attribute names and attribute value data. The sparse matrix is used to record the index information of knowledge triplets to form a three-dimensional model knowledge base. The feasibility of this method is demonstrated by semantic retrieval and reasoning on the label meshes dataset and the cultural relics dataset.

Keywords:

knowledge base; ontology construction; concept; semantic retrieval of 3D models

1. Introduction

A 3D model virtually represents the shape and appearance of real-world objects. Users can obtain comprehensive and accurate cognition by observing their appearance from multiple perspectives. Human beings’ understanding of the world mostly comes from vision. When observing 3D objects, they also obtain multiple two-dimensional pictures from different perspectives. Through vision, they perceive the size, light, shade, color, movement, and quietness of external objects. Once the brain receives the information of objects from the eyes, it relates it to the spatial shape, color, and dynamics of objects. It is still necessary to describe 3D models from two-dimensional views from multiple perspectives. Two-dimensional images imply rich visual knowledge. However, knowledge extraction is mainly text-based, more expressive non-text resources are difficult to describe in natural language, and it is still very difficult to build an easy-to-use 3D model knowledge base.

With the continuous development of multimedia technology, virtual reality, and augmented reality technology, 3D data have become a key component of today’s information multimedia systems. Sergiyenko proposed a primitive optical laser sensor and intelligent data management method for 3D machine vision, which is a very effective method for exploring any spatial sector using robots [1]. The combination of optoelectronic devices and robotic systems has played an important role in the attitude estimation, 3D reconstruction, and digitization of cultural heritage. Sometimes, the use of knowledge of these phenomena and their description application is able to change the quality of a whole automated process [2]. In order to improve the reasoning ability of deep learning methods, Hudson et al. constructed a scene graph based on visual entities, entity attributes, and entity relationships in annotated images, forming a dataset of 113,018 images with 22,669,678 problem descriptions combining visual knowledge extraction and question answering [3]. The labeled training data required for data driving may be scarce in some fields, but there is a lot of knowledge in the form of text. Hiranmay discussed the differences in knowledge integration that can be used for deep learning. They divided the knowledge needed for visual data processing into three levels. In the first level, image processing can combine object areas and their attribute knowledge in images, which is close to the underlying features. In the second level, object recognition can combine the relationship between objects or between objects and regions. In the third level, scene understanding can combine high-level common sense about event activities [4].

Fusion knowledge is becoming more and more important. The construction of common sense and a domain knowledge base promotes intelligent research. For example, WordNet vocabulary library [5], YAGO [6], Cyc [7], ConceptNet [8], and other large semantic knowledge bases are often applied in various intelligent research. There are three main sources of knowledge. First, according to the unstructured text description of recorded meaningful objects through natural language processing, such as Zitnick, one learns the visual features corresponding to the semantic phrases from visual scene sentences describing the existence attributes of objects and their spatial relationships [9].The NEIL project established a fixed vocabulary using the semi-supervised automatic extraction of scene object attributes and their relationships from internet data to form a large structured visual knowledge base [10]. The second is to use crowdsourcing as a way to collect knowledge, using volunteers to manually complete semantic descriptions. For images that cannot be described by text, abstract descriptions are designed, such as Zitnick and Parikh introducing visual abstraction [11], allowing different participants to operate clip art production scenes to replace semantic descriptions. The third is to automatically annotate the meaningful visual information of images or 3D models based on the content, and generating ontology-based region annotations from automatically extracted features. For example, the MPEG-7 standard focuses on the description of multimedia content, such as images and 3D models, enabling experts to define domain-specific rules and map specific combinations of low-level visual features (color, texture, shape, size, etc.) to their domain ontology to define advanced semantic terms [12]. Gupta et al. extracted concave, convex, saddle-shaped, and other surface features from CAD models and developed shape ontology to realize the equivalence relation between surfaces and semantics and to support the semantic interoperability of the production parts [13,14].The first and second knowledge extraction methods start from human natural language or are implemented by human operation, and the extracted knowledge is closer to the daily knowledge shown by the high-level attained.The third method starts from the features of the image or 3D model itself and maps them to high-level semantics, extracting knowledge that is closer to the underlying shape–space knowledge.

Images and 3D models can better explain cognition than text, but extracting semantics directly from images or 3D models is very difficult. The ability to bring machines closer to human cognition and understanding is the driving force of artificial intelligence research, and how machines possess knowledge is a topic that researchers constantly explore. This article proposes a knowledge base construction method for 3D models that extracts features from different observation views and performs semantic transformation. The basic ontology is designed from top to bottom, and corpus data are added from bottom to top, absorbing ConceptNet common sense knowledge to complete the knowledge base construction. The key contributions of our work are as follows:

(1) Combining the abstract descriptor of the text description, the observation view, and the 3D model itself organically improves the intelligent management ability of the model and realizes the effective reuse of a 3D model.

(2) We design the basic ontology and then construct the corpus into the corresponding positions from bottom to top. In addition, the basic ontology itself and the instances are unified as entities to form an integrated knowledge base to achieve reasoning and strengthen the description, interoperability, and other functions. (3) The 3D model visual knowledge ontology defines a clear specification, as in addition to text, the descriptors, images, and surfaces need to be associated. The vertices in the graph structure need to be described concisely and effectively, and the relationships represented by the edges need to be correctly expressed through inheritance, transmission, symmetry, and reflexivity, as well as unconventional, slightly subtle, and equivalent characteristics.

The rest of the paper is organized as follows. Section 2 briefly discusses the previous research and background. Section 3 presents the method of the procedure of the 3D model knowledge base system. Our experiment evaluation is presented in Section 4. Finally, we conclude the paper and suggest directions for further research.

2. Related Work

Knowledge extraction is mainly text based, and it is difficult to describe the more expressive 3D model in natural language. The knowledge obtained to describe the 3D model mainly comes from semantic annotation. Once the relevant corpus is obtained, effective knowledge representation is required for better retrieval and reuse. Therefore, the relevant work is elaborated from two aspects: 3D model semantic annotation and knowledge representation methods.

2.1. Semantic Annotation of a 3D Model

When using 3D models collected online for retrieval research, Min et al. found that using shape-matching algorithms was better than matching existing low-quality text using 3D models. However, fuzzy text was still useful for classification. They used WordNet to add synonyms and category names to model file names and combined shape matching to further improve the retrieval performance [15]. Wang et al. first annotated the attributes of 3D models, developed shape ontology from top to bottom, inferred the semantic attributes of 3D models using semantic web rule language (SWRL), and completed 3D model retrieval [16]. Kassimi et al. extracted features from 3D models for effective semantic annotation, classified them, and derived other high-level semantics to expand the knowledge base using spatial relations, thus improving the accuracy of retrieval [17]. In the above studies, the automatic semantic annotation was mainly based on low-level shape attribute descriptions such as convexity, roundness, smoothness, etc. Advanced semantics that were close to human understanding were still annotated interactively.

The above 3D model semantic annotation was aimed at 3D model retrieval. The following analysis focuses on the method of improving the semantic processing capability through annotation. Symonova et al. used a completely unsupervised segmentation algorithm to decompose the 3D furniture model into components and used domain ontology to map the underlying features and concepts, bridging the gap between keyword-based methods and instance-based query methods [18]. Gao et al. classified advanced semantic processing into three categories: using online learning methods based on association feedback, using offline machine learning methods, and defining advanced concepts by object ontology [19]. Catalano et al. provided a 3D model segmentation tool that described the objects segmented by the 3D model through semantic annotation and developed a feature ontology that characterized knowledge through the attributes of objects and their relationships with other parts of the features [20].

Shapira et al. divided an object into multiple meaningful parts and used the annotations of the components in the object to find similar parts in other objects. It enhanced the ability to perform analysis and modeling tasks and achieve matching queries of the components of the 3D model [21]. Gupta and others developed shape ontology, established the equivalence relation between the surface features such as concave, convex, and saddle-shape extracted from the CAD free-form surface model and semantics, and realized the semantic interoperability of the production parts [14]. It can be seen that more advanced semantic processing is inseparable from extracting meaningful parts or significant areas of the 3D model, but it is difficult to fully automate advanced semantic annotation.

2.2. Construction of a 3D Model Knowledge Base

Many general knowledge domains establish knowledge graphs to represent knowledge. Wang et al. defined a knowledge graph as a multi-relationship graph composed of entities and relationships, where entities are represented by nodes, and relationships are represented by different types of edges [22]. Unlike the common sense knowledge base, the domain knowledge map pays more attention to the depth of knowledge in professional fields and has been widely used in geographic information, medicine, e-commerce, etc. [23]. Dworschak et al. proposed a semantic integration method for engineering design to realize the knowledge ontology automation of mechanical engineering part design and manufacturing. The knowledge mainly came from the design documents. Its non-literal sketch recorded the starting and ending points of the boundary, extrusion, rotation, and other operations to improve the description of the shape [24]. Zhang et al. studied the construction of a knowledge graph of cultural relics, achieving tasks such as entity extraction, relationship mining, knowledge fusion, and inference from the natural language describing cultural relics and assisting in the construction of smart museums [25].

Ontology is the core of knowledge representation, organized according to public norms to promote knowledge sharing. Ontology does not inherently specify the mechanisms by which the represented knowledge is used in practice [26]. Khanam et al. established a broad basic ontology that included different domains [27]. When new concepts were discovered, they could be automatically assigned to different categories through WordNet, and each feature in the machine-readable comprehensive knowledge base was machine interpretable. Zhang and others derived the ontology about the metal material field from Yago open knowledge base based on a string matching algorithm and used the existing knowledge base to efficiently establish the required domain knowledge base [28]. Bharadwaj proposed a method for constructing multi-relationship and multi-level knowledge graphs based on the clustering of 3D shapes to achieve multi-pattern search combining 3D components and text data obtained from previous design-related tasks [29].

In summary, the sources of knowledge were mostly extracted from text. Traditional knowledge representation methods have the advantage of accurate expression and easy reasoning, but they require manual construction, which is time-consuming and labor-intensive. In the knowledge description of 3D models, the relevant corpus data are still scarce. Although there are methods for directly extracting knowledge from features in some production parts’ knowledge bases, common 3D models do not all have regular shapes and precise production processes. In addition, there are difficulties in describing the 3D model itself in text, requiring different types of data to describe the 3D model.

3. Method

Firstly, the spatial concept is combined with feature description to achieve automatic semantic extraction. We establish different viewpoint orientations for the 3D model, calculate the feature values, such as the roundness and rectangularity of the model contour, define the visual axis using the PCA principal axis and right angle linearity optimization method, and calculate the symmetry. Then, we establish semantic conversion rules for different feature values, divide the degree of representation of different features, and automatically output model contours in a hierarchical manner combined with spatial orientation semantic descriptions. Then, the 3D model visual knowledge ontology is designed based on the upper ontology of the machine-readable comprehensive knowledge base and the relational structure of the ConceptNet ontology, and a weighted directed-graph representation method is proposed. The sparse matrix integrated semantic dictionary is used as a carrier to establish an entity dictionary, a relationship dictionary, and to cover attribute names and attribute value data. The sparse matrix records the index information of the knowledge triplet to form a 3D model knowledge base.

3.1. Feature-to-Semantic Mapping Rules

People often reach a consensus on the upright front of most objects and are accustomed to defining object orientation information, which has enlightening significance for describing and comparing 3D models. The semantic extraction rule combining the orientation is based on the assumption that the 3D model is upright and frontal, to support the automatic output of the shape contour feature semantic description of the 3D model combined with the spatial orientation. The definition of a spatial concept starts from the perspective of the human observation of objects, which describes three-dimensional models from the appearance of the objects presented from multiple perspectives. The subset of viewpoints includes the top, bottom, left, right, front, rear, front northwest, front northeast, front southwest, front southeast, rear northwest, rear northeast, rear southwest, and rear southeast. Figure 1 is a schematic diagram of several observation viewpoints on the front of the cultural relic horse. The viewpoint obtained between the front, upper, and left sides is the front northwest.

3.1.1. Feature Calculation

The roundness is used to measure the degree of approximation between a shape and a circle. The roundness is

R o u n d = \frac{4 \times p i \times Area}{P e r^{2}}

. The closer the contour is to a circle, the closer the roundness value is to 1. The minimum bounding rectangle (MBR) method uses the ratio of the area of the contour bounding area to its minimum bounding rectangle area to define the degree of rectangularity, but the MBR is too sensitive to shape boundaries. Rosin relaxed the requirement that the rectangle in the MBR must contain all boundary points, and only the selected rectangle needed to contain most areas [30]. The robust minimum boundary rectangle is

R_{R} = 1 - \frac{A_{R} + A_{D}}{A_{I}}

, where

A_{R}

is the difference area between the selected rectangle and the shape area,

A_{D}

is the difference area between the selected rectangle and the shape area, and

A_{I}

is the intersection area of the area and the rectangle. This formula provides a tradeoff forcing the rectangle to contain most of the data while keeping it as small as possible. When calculating the robust minimum boundary rectangle, we first calculate the minimum rectangular boundary of the shape and then iterate to calculate

R_{R}

starting from half its area. When

\frac{A_{R} + A_{D}}{A_{I}}

is the smallest, the maximum rectangle is obtained. When the contour shape is rectangular,

R_{R}

is 1. The rectilinearity can measure the degree to which a closed polygon has a linear shape. Linear structures usually correspond to artificial structures and are one of the important visual clues [31]. At this point, the polygon is constructed close to a straight line, and Zunic et al. defined the measure of rectilinearity as:

R_{L} (P) = \frac{4}{4 - π} (max_{α \in [0, 2 π]} \frac{P e r_{2} (P)}{P e r_{1} (P, α)} - \frac{π}{4}),

where

P e r_{2} (P)

is the Euclidean circumference of polygon P, and

P e r_{1} (P)

is the

l_{1}

metric circumference of polygon P. For any polygon P,

R_{L} (P) \in (0, 1]

,

R_{L} (P) = 1

. When P is a straight line,

R_{L} (P)

remains unchanged after similarity transformation, and the closer

R_{L} (P)

is to one polygon, the more obvious the straight line features are.

3.1.2. Semantic Transformation Rules

Based on the roundness, rectangularity, and rectilinearity, the visual axis and symmetry description of the contour shape are defined by combining the roundness, rectangularity, aspect ratio, right-angle linearity, and the PCA. When describing shapes in humans, they are often vague, and shape analogy is an effective way to describe them. It is easy to compare feature attribute values, but they are relatively approximate. In order to perform semantic extraction, the description range is defined based on the degree of feature values to achieve the transformation from feature to semantics.

The semantic description of the contour shape analogy is set to two levels: very similar and similar. To set the degree to which the contour shape approximates circles and rectangles, the transformation from shape to semantics can be achieved well for perspectives with circular and rectangular contours. However, for complex shapes, it is difficult to use common geometric shapes for analogy. Therefore, shape sketches are retained and only described as class aliases and orientations, such as the shape of an aircraft front.

The calculation of the aspect ratio uses the minimum bounding rectangle of the shape to directly export the contour aspect ratio data. At this point, the semantics only record the numerical values and do not have units. Only by combining more knowledge can a higher level description be obtained in specific situations. The semantics directly extracted from shapes are only attribute values, and after establishing the corresponding relationship between the knowledge base and the shape, it is much easier to change the shape from the new knowledge in turn. When the actual size measurement is introduced, the shape size can be changed and unified with the actual world to support the better utilization of virtual models.

There are always many complex shapes that can be obtained from different perspectives, and it is difficult to describe them solely using analogies. However, there is generally a basic understanding of the shape axis. Firstly, we calculate the distance from the contour to its centroid; then, we calculate the rotation angle using PCA and Cartesian linearity methods. Finally, we calculate the effective number of pixels that the rotated shape projects onto the x and y axes and select the axis with the minimum number of pixels as the visual axis.

The visual axis of the contour calculation describes the intermediate axis of a shape, and its symmetry can be calculated based on this axis or its perpendicular axis. When the point in the image is symmetrical about the axis, it is 1 when it is inside the shape and 0 when it is outside the shape. The symmetry is calculated based on

S y m = \frac{\sum_{i = 1}^{n} I_{w} (x_{i}, y_{i})}{A r e a}

, and when the shape is completely symmetrical, the value of the sym is 1. The higher the value, the worse the symmetry. The degree of symmetry is described in three levels, where the sym is set to be symmetric about the axis of symmetry at (0.9,1), the sym is approximately symmetric at (0.7,0.9), and the others are asymmetric.

In addition to intuitive semantic descriptions, abstract descriptions of observation views from different perspectives are carried out using the methods in [32]. Using the histogram of threshold words based on information entropy to abstract the representative views of 3D models, it is convenient to realize the similarity comparison between complex observation views of different models.

3.1.3. Three-Dimensional Model Ontology

The design of the ontology can be either top-down based on classification, or bottom-up based on extraction from existing knowledge bases. We combine the two methods, first designing the basic ontology and then constructing the corpus in the corresponding positions from bottom to top. In addition, the basic ontology itself and the instances are unified as entities to form an integrated knowledge base to achieve reasoning and strengthen the description and interoperability functions.

In the construction method of a machine-readable comprehensive knowledge base based on ontology, the ontology establishes a broad hierarchical structure, and all concepts are assigned under the three root categories of existence, science, and part of speech [27]. The 3D model knowledge base is intended to associate heterogeneous data, such as text, pictures, and 3D mesh surfaces. When updating the corresponding description equivalent to text and surface meshes in the knowledge base, the corresponding interoperability will be carried out. For example, if the size description of the model in the knowledge base is changed, the corresponding size of the 3D model will be updated. The hierarchical classification in the ontology design draws inspiration from the machine-readable comprehensive knowledge base construction method, while the relationship selection draws inspiration from ConceptOnto. All knowledge is organized in the form of triples, and the generation of entities and relationships is the core of ontology development. For the case of an entity attribute name and attribute value, the attribute name is defined as a relationship, and the attribute value is defined as an abstract description entity. In the abstract description entity, it can be a numerical value or a vector. All entities appear in the form of strings and are read according to different applications before being converted to the corresponding data type.

The establishment of a top-down specification for the 3D model knowledge foundation ontology automatically aligns the semantic description obtained in Section 3.1 and the corresponding knowledge in the ConceptNet common sense database to the classification of the ontology. The knowledge representation with weighted digraph structure is realized by the sparse-matrix-integrated semantic dictionary. Based on the constructed 3D model knowledge base, the application of relational property reasoning, 3D model semantic retrieval, semantic interoperability, etc., can be realized. The hierarchical structure of the basic ontology is shown in Figure 2. Existence and science are the highest levels, and the classification in science is consistent with the machine-readable comprehensive knowledge base ontology, preparing for expanding knowledge. For example, adding specialized knowledge such as the density and material of objects that cannot be directly extracted from three-dimensional shapes. Existence is the core of the representation of the 3D model after knowledge extraction, which is divided into physical existence and abstract existence.

The hierarchical structure is beneficial for the inheritance of attributes, and the standardization of relationships is the key to improving the ability to describe knowledge. We include supersets and instances in relational terms, as well as attribute names, such as height, width, thickness, color, and angle, to adapt to the unified representation of the graph structure. In order to automatically add existing knowledge in the ConceptNet common sense database, we select the set relationships to include in the ontology’s relationship words. As shown in Table 1, the first column contains the naming of the relationship, the second column contains the characteristics of the relationship, the third column displays the opposite relationship, and the fourth column contains the original relationship name selected in ConceptOnto. In addition to the commonly used transitive, symmetric, and reflexive features of relationships, adverbs, such as the inheritance and degree of description, are also added. The inheritance is reflected in the instantiation process, where the instantiated entity inherits a set of relationships from its class, meaning that each instantiated object can obtain the relevant knowledge of the underlying ontology leaf entity. In addition, very and slightly similar relational vocabulary can be reflected in similarity, corresponding to very similar and slightly similar related adjectives, enriching the ability to describe knowledge. For attribute relations, the feature is defined as equivalence, and for the equivalence relation, the corresponding 3D model can be updated according to the modifications in the knowledge base to achieve semantic interoperability.

3.2. Weighted Directed-Graph Representation

The 3D model visual knowledge ontology defines a clear specification, as in addition to text, the descriptors, images, and surfaces need to be associated. The vertices in the graph structure need to be described concisely and effectively, and the property represented by the edges needs to be correctly expressed through inheritance, transmission, symmetry, and reflexivity, as well as unconventional, subtle, and equivalent characteristics. The opposite property defined also needs to be effectively reflected. We use weighted directed graphs for representation, where the vertices represent the entities, and the edges represent the property. The triple form is very suitable for using a graph structure for representation, and the graph structure can use graph-theory-related algorithms. The specific implementation uses a sparse-matrix-integrated semantic dictionary.

In a directed graph, the edges represent the property between entities. The specific data of the same category in the 3D model will appear in the form of instances corresponding to the classification in the basic ontology, and their property is represented by the has_instance. The naming of specific instance models is in the form of the class name plus the absolute path, such as cup_D-LabelMeshes-cup-1.obj. As it is a local knowledge base, it is convenient to directly correspond with the model path based on the name. The specific instances in the aircraft category can inherit the relevant knowledge of the aircraft without the need to add them separately. For example, by adding knowledge cups from ConceptNet for drinking water, it can be concluded that the instantiated data can also be used for drinking water. When reusing the corresponding 3D model, rich knowledge can be obtained. Similarly, representative contours, depth maps, sketches, descriptors, and prominent areas that are difficult to describe using text are also named in this way.

Each newly added 3D model and the related data to the knowledge base are instances of the corresponding leaf nodes of the basic ontology, and the growing instances need to establish a subset property with these nodes. Obviously, the edges representing the property are densely distributed, where they establish the property with the leaf node entities of the underlying ontology, while the property between the newly added instances is very dispersed. For entities imported from ConceptNet, a keyword search is used to obtain them. Adding entities not defined in the ontology to a ring domain of the leaf node instance of the basic ontology does not affect the hierarchical structure of the basic ontology. From the perspective of the adjacency matrix to store the graph, a sparse matrix is very suitable. One can build a very large integer sparse matrix to store the graph structure. We establish an entity dictionary. The serial number of each entity is the row number of the sparse matrix. The most commonly used leaf node entity of the basic ontology is at the front of the dictionary for efficient searching.

The relationship and characteristics of the edges are stored in two sparse matrices of the same size. One sparse matrix stores the serial number of the relational words in the relational dictionary, called the relational matrix, and the other matrix of the same size stores the characteristics of edges, called the characteristic matrix. The weight values in the relational matrix are designed as the indexes of relational words in the relational dictionary, which can be directly read when imported. The feature matrix is used for reasoning and strengthening the description and semantic interoperability. The default weight value is 1. Different weight values are set to represent different relationship characteristics. The characteristics of a relationship can be defined as transitive, symmetric, reflexive, inherited, very, slightly, or equivalent, as shown in Table 2. The weight value is a decimal integer, but its binary form represents the encoding of the relationship characteristics. The seven characteristics have a total of seven position markers, as shown in the second column of the table. When the weight value is the relationship characteristic value of 128, that is (1111111)+1, and if all seven binary digits in parentheses are 1, it means that this side has the above seven characteristics simultaneously. The transitive, symmetric, and reflexive properties can be used for reasoning. The inheritance is used to obtain the knowledge of the category to which the instance belongs. The extraordinary and slight properties are used to describe the degree of reinforcement and are equivalent to the semantic interoperability.

When completing the inference in a graph structure, the inference is the process of transforming the weights of relational features into

* * *

0000+1, representing the transitive, symmetric, reflexive, and inherited features to transform the graph structure. If a, b, and c belong to the entity set X, and R is a relationship between the entities, then the transitivity is defined as

\forall a, b, c \in X : (a R b, b R c) \Rightarrow a R c

. When reasoning in a graph structure, we first find the continuous connected edges with transitive weights; then, we supplement all the entities between these edges in pairs. Finally, we set the transitive weights to zero, and update the corresponding positions in the relationship matrix at the same time. The symmetry is defined as

\forall a, b \in X : a R b \Rightarrow b R a

. When reasoning in a graph structure, an edge is added to the symmetric position in the matrix, and the corresponding position in the relationship matrix is updated. A reflexive relation is defined as when anything with such relationship characteristics is related to itself. In this case, the diagonal position 1 represents an edge from the entity to itself, and the relationship matrix is updated. For the inheritance property, the instantiation model points one edge of the class to the instantiated entity, obtaining the corresponding knowledge. For the opposite relationship, the initial weight value of the inverse relationship in the relationship matrix is negative. When completing the inference, the symmetric position of the matrix is assigned to the inverse relationship, and the weight value is regularized to complete the operation.

4. Results

4.1. Semantic Retrieval of 3D Models

Upon establishing a 3D model visual knowledge base that integrates heterogeneous data, such as text, images, descriptors, and 3D mesh surfaces, semantic retrieval is very convenient. Upon finding corresponding words in the entity and relationship dictionaries in the knowledge base, the entities and property in the retrieved semantics can be retrieved in the graph structure by searching for connection paths.

Figure 3 shows the local graph structure of the formed 3D model visual knowledge base, where the blue entities are the basic ontology defined from top to bottom and the common sense imported from ConceptNet (is_used_for-drinking). Firstly, we set the edge of the layer structure above the leaf entity of the basic ontology to 0 to disconnect, converted it into an undirected graph, and used the leaf node entity with a large class as the endpoint of the detection path. Secondly, we located the positions of the entity words and relationship words in semantic retrieval in the entity and relationship dictionaries, found the shortest connecting path between the entity words and all set endpoints, preserved the paths containing the relationship words, and if there were strengthening degree descriptors, found the corresponding positions in the feature matrix and filtered the paths again. Thirdly, the instantiated entities of the leaf entity on the path were the search results. If the path did not include an instantiated entity, all instantiated entities under this leaf entity were the search results. Fourthly, if there were multiple entities and relationships in the search, the result was the intersection of the results obtained from each path. The red entity is an instance of a specific 3D model, and its entities and property were calculated from the data and aligned from bottom to top. The relationship between the red dashed lines was automatically inferred based on the inheritance characteristics of has_instance. For example, to retrieve a 3D model that looks very similar to a circle from above, we first calculated the shortest paths from the starting point to the endpoint of the circular entity. We filtered the results connected to the top entity again to obtain the entity cup_D-LabelMeshes-21.obj with two paths overlapping. Finally, the extraordinary characteristics in the characteristic matrix of is_similar_to on the path from the circle to the cup were filtered to obtain the results. If the model used for drinking water was retrieved, all instantiated entities of the cup were obtained.

We constructed a 3D model knowledge base on a LabelMeshes [33] of 380 models and 19 categories. When retrieving the model with the closest rectangular view, the shortest path distance was three, and 186 views of 42 models were returned, which were distributed in the categories of mech, plier, vase, table, cup, and chair. The front view of the cup category 39.obj and the front right upper side view of the mech category 339.obj were both very similar rectangles, indicating the visual similarity hidden in different 3D models from different perspectives.

4.2. Performance of the Knowledge Base Reasoning

(1) To analyze the performance of the knowledge base inference methods, a sparse relationship matrix and characteristic matrix of 1,000,000 × 1,000,000 were established. The entity dictionary was established based on the designed basic ontology, the 3D instance model was automatically added, and the representation of the has_instance in the relationship matrix was updated. We automatically added instances of the calculated multi-view contours, aspect ratios, and visual axes, and transformed the semantics of the approximate circles, rectangles, and symmetries based on the set ranges. We automatically added the component property provided by LabelMeshes, and finally, we automatically added the common knowledge related to this dataset category in ConcepNet. Among them, only the triplet data with header entities as related categories were retained, resulting in a knowledge base containing 30,759 entities and 268,640 properties. The runtime for executing the inference on a computer with 2.30 GHz Intel Core i5-8300 and 8 GB RAM is shown in Table 3. Among them, 17,807 new properties were added after the transfer feature reasoning, and 2744 new properties were added after the symmetric feature reasoning. The reflexive relation mainly came from the is_related_to, with 193,507 new properties. The inheritance relationship came from the has_instance, with 112,227 new properties added. The execution time was mainly consumed by finding the corresponding features. The slow execution time of the transfer feature was due to the need to perform a deep traversal of the multiple transitive relations.

(2) To analyze the retrieval performance, the constructed visual knowledge base was expanded to retrieve models with the closest rectangular view. During the expansion, the basic ontology and added 3D model entities remained unchanged. The remaining entities and corresponding properties were copied and added at the end of the rows and columns of the feature matrix. In this way, the number of entities expanded nearly twice, and the number of properties expanded nearly four times. The entities, property, occupied space, and retrieval time are shown in Table 4, and the execution time of the retrieval nearly doubled, approaching the growth rate of the number of entities. Although the number of properties increased significantly, the impact on the time was not significant due to the fact that the portion of the relationship growth did not increase the properties in the relevant pathways. The efficiency was high when retrieving the neighborhood properties in a loop, but it was not suitable for real-time retrieval when the distance between the entities to be retrieved was 3; however, it was within an acceptable range.

5. Conclusions and Future Work

This paper proposed a representation method for the visual knowledge of 3D models, constructed a basic ontology with a broad hierarchical structure, and established multiple relational words and word characteristic specifications. We organically organized different forms of data, such as text, images, descriptors, and 3D surfaces, and combined them in a top-down and bottom-up manner to form a 3D model knowledge base. We proposed a weighted directed-graph representation method to achieve the correct expression and inference of the relationship characteristics through transitive, symmetric, reflexive, inherit, approximate, vaguely, and equivalent. The specific implementation used the sparse-matrix-integrated semantic dictionary method and used the graph structure to express the knowledge triplet. Both the basic ontology and the specific entities and property are easy to expand and can be shared. Upon establishing a knowledge base composed of different types of data, such as text, images, descriptors, and 3D models, the semantic corresponding entities and properties can be conveniently retrieved in the graph structure by searching for connected paths. It also supports the semantic retrieval of multiple entities, making it more intuitive and effective to reuse 3D models. The retrieval results can also be accompanied by common knowledge to enrich the interpretation of the model. In future research, this method will be applied to digital management in the field of cultural heritage protection, and further consideration will be given to updating the knowledge base based on actual dimensions and other attributes. Research will be conducted on the construction of various complex shape parts based on the knowledge descriptions to enhance the intelligent application of the 3D model knowledge base.

Author Contributions

Conceptualization, P.Z.; methodology, S.Z.; software, S.Z.; validation, P.Z. and S.Z.; formal analysis, S.Z.; investigation, P.Z. and S.Z.; resources, P.Z. and S.Z.; data curation, P.Z. and S.Z.; writing—original draft preparation, S.Z.; writing—review and editing, S.Z.; visualization, S.Z.; supervision, P.Z.; project administration, P.Z. and S.Z.; funding acquisition, P.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R&D Program named Comprehensive public cultural resource cloud service platform and Resource Pool Construction (2020YFC523303).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The LabelMeshes Dataset used for this study can be accessed at https://people.cs.umass.edu/kalo/papers/LabelMeshes/index.html (accessed on 12 February 2021).

Conflicts of Interest

The authors declare no conflict of interest.

References

Sergiyenko, O.Y.; Tyrsa, V.V. 3d optical machine vision sensors with intelligent data management for robotic swarm navigation improvement. IEEE Sens. J. 2021, 21, 11262–11274. [Google Scholar] [CrossRef]
Sergiyenko, O. Optoelectronic Devices in Robotic Systems; Springer Nature: Cham, Switzerland, 2022. [Google Scholar]
Hudson, D.A.; Manning, C.D. GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 6693–6702. [Google Scholar] [CrossRef]
Ghosh, H. Computational Models for Cognitive Vision; John Wiley & Sons: Hoboken, NJ, USA, 2020. [Google Scholar]
Miller, G.A. WordNet: A Lexical Database for English. Commun. ACM 1995, 38, 39–41. [Google Scholar] [CrossRef]
Suchanek, F.M.; Kasneci, G.; Weikum, G. Yago: A Core of Semantic Knowledge. In Proceedings of the 16th International Conference on World Wide Web, WWW’07, New York, NY, USA, 8 May 2007; pp. 697–706. [Google Scholar] [CrossRef]
Lenat, D.B. CYC: A Large-Scale Investment in Knowledge Infrastructure. Commun. ACM 1995, 38, 33–38. [Google Scholar] [CrossRef]
Speer, R.; Chin, J.; Havasi, C. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI’17, San Francisco, CA, USA, 4–7 February 2017; pp. 4444–4451. [Google Scholar]
Zitnick, C.L.; Parikh, D.; Vanderwende, L. Learning the Visual Interpretation of Sentences. In Proceedings of the 2013 IEEE International Conference on Computer Vision, ICCV ’13, Sydney, NSW, Australia, 1–8 December 2013; pp. 1681–1688. [Google Scholar] [CrossRef]
Chen, X.; Shrivastava, A.; Gupta, A. NEIL: Extracting Visual Knowledge from Web Data. In Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, NSW, Australia, 1–8 December 2013; pp. 1409–1416. [Google Scholar] [CrossRef]
Zitnick, C.L.; Parikh, D. Bringing Semantics into Focus Using Visual Abstraction. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 3009–3016. [Google Scholar] [CrossRef]
Kim, D.H.; Park, I.K.; Yun, I.D.; Lee, S.U. A New MPEG-7 Standard: Perceptual 3-D Shape Descriptor. In Proceedings of the Advances in Multimedia Information Processing—PCM 2004: 5th Pacific Rim Conference on Multimedia, Tokyo, Japan, 30 November–3 December 2004; Aizawa, K., Nakamura, Y., Satoh, S., Eds.; Springer: Berlin/Heidelberg, Germany, 2005; pp. 238–245. [Google Scholar]
Gupta, R.K.; Gurumoorthy, B. Automatic Extraction of Free-Form Surface Features (FFSFs). Comput. Aided Des. 2012, 44, 99–112. [Google Scholar] [CrossRef]
Gupta, R.K.; Gurumoorthy, B. Feature-based ontological framework for semantic interoperability in product development. Adv. Eng. Inform. 2021, 48, 101260. [Google Scholar] [CrossRef]
Min, P.; Kazhdan, M.; Funkhouser, T. A Comparison of Text and Shape Matching for Retrieval of Online 3D Models. In Proceedings of the Research and Advanced Technology for Digital Libraries: 8th European Conference, ECDL 2004, Bath, UK, 12–17 September 2004; Heery, R., Lyon, L., Eds.; Springer: Berlin/Heidelberg, Germany, 2004; pp. 209–220. [Google Scholar]
Wang, X.; Lv, T.; Wang, S.; Wang, Z. An Ontology and SWRL Based 3D Model Retrieval System. In Proceedings of the Information Retrieval Technology: 4th Asia Infomation Retrieval Symposium, AIRS 2008, Harbin, China, 15–18 January 2008; Li, H., Liu, T., Ma, W.Y., Sakai, T., Wong, K.F., Zhou, G., Eds.; Springer: Berlin/Heidelberg, Germany, 2008; pp. 335–344. [Google Scholar]
Kassimi, M.A.; Beqqali, O.E. Semantic based 3D model retrieval. In Proceedings of the 2012 International Conference on Multimedia Computing and Systems, Tangiers, Morocco, 10–12 May 2012; pp. 195–199. [Google Scholar]
Symonova, O.; Dao, M.S.; Ucelli, G.; de Amicis, R. Ontology Based Shape Annotation and Retrieval. In Proceedings of the C&O@ECAI, Riva del Garda, Italy, 28 August 2006. [Google Scholar]
Gao, B.; Zheng, H.; Zhang, S. An Overview of Semantics Processing in Content-Based 3D Model Retrieval. In Proceedings of the 2009 International Conference on Artificial Intelligence and Computational Intelligence, Shanghai, China, 7–8 November 2009; Volume 2, pp. 54–59. [Google Scholar] [CrossRef]
Catalano, C.E.; Falcidieno, B.; Attene, M.; Robbiano, F.; Spagnuolo, M. Shape Knowledge Annotation for Virtual Product Sharing and Reuse. Eng. Syst. Des. Anal. 2008, 48371, 257–264. [Google Scholar]
Shapira, L.; Shalom, S.; Shamir, A.; Cohen-Or, D.; Zhang, H. Contextual Part Analogies in 3D Objects. Int. J. Comput. Vision 2010, 89, 309–326. [Google Scholar] [CrossRef]
Wang, Q.; Mao, Z.; Wang, B.; Guo, L. Knowledge Graph Embedding: A Survey of Approaches and Applications. IEEE Trans. Knowl. Data Eng. 2017, 29, 2724–2743. [Google Scholar] [CrossRef]
Lin, J.; Zhao, Y.; Huang, W.; Liu, C.; Pu, H. Domain Knowledge Graph-Based Research Progress of Knowledge Representation. Neural Comput. Appl. 2021, 33, 681–690. [Google Scholar] [CrossRef]
Dworschak, F.; Kügler, P.; Schleich, B.; Wartzack, S. Integrating the Mechanical Domain into Seed Approach. Proc. Des. Soc. Int. Conf. On Eng. Des. 2019, 1, 2587–2596. [Google Scholar] [CrossRef]
Zhang, M.; Geng, G.; Zeng, S.; Jia, H. Knowledge Graph Completion for the Chinese Text of Cultural Relics Based on Bidirectional Encoder Representations from Transformers with Entity-Type Information. Entropy 2020, 22, 1168. [Google Scholar] [CrossRef] [PubMed]
Jakus, G.; Milutinovic, V.; Omerovic, S.; Tomazic, S. Concepts, Ontologies, and Knowledge Representation; Springer Publishing Company, Incorporated: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
Khanam, S.A.; Liu, F.; Chen, Y.P.P. Comprehensive Structured Knowledge Base System Construction with Natural Language Presentation. Hum.-Centric Comput. Inf. Sci. 2019, 9, 23. [Google Scholar] [CrossRef]
Zhang, X.; Pan, D.; Zhao, C.; Li, K. MMOY: Towards deriving a metallic materials ontology from Yago. Adv. Eng. Inform. 2016, 30, 687–702. [Google Scholar] [CrossRef]
Bharadwaj, A.G.; Starly, B. Knowledge graph construction for product designs from large CAD model repositories. Adv. Eng. Inform. 2022, 53, 101680. [Google Scholar] [CrossRef]
Rosin, P. Measuring shape: Ellipticity, rectangularity, and triangularity. In Proceedings of the 15th International Conference on Pattern Recognition, ICPR-2000, Barcelona, Spain, 3–8 September 2000; Volume 1, pp. 952–955. [Google Scholar] [CrossRef]
Zunic, J.; Rosin, P. Rectilinearity measurements for polygons. IEEE Trans. Pattern Anal. Mach. Intell. 2003, 25, 1193–1200. [Google Scholar] [CrossRef]
Zeng, S.; Geng, G.; Zhou, M. Automatic Representative View Selection of a 3D Cultural Relic Using Depth Variation Entropy and Depth Distribution Entropy. Entropy 2021, 23, 1561. [Google Scholar] [CrossRef]
Kalogerakis, E.; Hertzmann, A.; Singh, K. Learning 3D Mesh Segmentation and Labeling. ACM Trans. Graph. 2010, 29, 1–12. [Google Scholar] [CrossRef]

Figure 1. A schematic diagram of several observation viewpoints on the cultural relic horse.

Figure 2. The hierarchical structure of the basic ontology of the 3D model knowledge base.

Figure 3. Schematic diagram of the local entity relationship in the 3D model knowledge base.

Table 1. Property specification. I: inherit, T: transitive, S: symmetric, R: reflexive, A: approximate, V: vaguely, E: equivalent.

Property	Properties	Inverse	Original Property
is			IsA
has			HasA
has_instance
is_observed_from
is_superclass_of	T	is_subclass_of
is_similar_to	T,S,A,V		SimilarTo
is_symmetric_about
is_used_for			UsedFor
is_related_to	T,R		RelatedTo
is_part_of	T	has_part	PartOf
has_antonym	S		Antonym
has_synonym	T,S		Synonym
height_is	E
width_is	E
thickness_is	E
color_is	E
angle_is
aspect_ratio_is

Table 2. The weights of relational features.

Property	Weight	Function
Transitive	1(0000001)+1	Complete knowledge
Symmetric	2(0000010)+1	Complete knowledge
Reflexive	4(0000100)+1	Complete knowledge
Inherit	8(0001000)+1	Complete knowledge
Approximate	16(0010000)+1	Strengthen description
Vaguely	32(0100000)+1	Strengthen description
Equivalent	64(1000000)+1	Semantic interoperability

Table 3. Knowledge base inference execution time.

Property	Number of Properties before Inference	Number of Properties after Inference	Time (s)
Transitive	268,640	286,447	726.01
Symmetric	286,447	289,191	8.85
Reflexive	289,191	482,698	364.93
Inherit	482,698	594,925	8.26

Table 4. Semantic retrieval execution time after expanding the knowledge base.

Number of Entities	Number of Relations	Storage Space (kb)	Time (s)
30,759	268,640	622	224.9
61,061	1,017,799	1759	453.2
121,665	3,958,911	6967	1041.3
242,873	15,612,319	38,293	2516.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhou, P.; Zeng, S. Construction of a 3D Model Knowledge Base Based on Feature Description and Common Sense Fusion. Appl. Sci. 2023, 13, 6595. https://doi.org/10.3390/app13116595

AMA Style

Zhou P, Zeng S. Construction of a 3D Model Knowledge Base Based on Feature Description and Common Sense Fusion. Applied Sciences. 2023; 13(11):6595. https://doi.org/10.3390/app13116595

Chicago/Turabian Style

Zhou, Pengbo, and Sheng Zeng. 2023. "Construction of a 3D Model Knowledge Base Based on Feature Description and Common Sense Fusion" Applied Sciences 13, no. 11: 6595. https://doi.org/10.3390/app13116595

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Construction of a 3D Model Knowledge Base Based on Feature Description and Common Sense Fusion

Abstract

1. Introduction

2. Related Work

2.1. Semantic Annotation of a 3D Model

2.2. Construction of a 3D Model Knowledge Base

3. Method

3.1. Feature-to-Semantic Mapping Rules

3.1.1. Feature Calculation

3.1.2. Semantic Transformation Rules

3.1.3. Three-Dimensional Model Ontology

3.2. Weighted Directed-Graph Representation

4. Results

4.1. Semantic Retrieval of 3D Models

4.2. Performance of the Knowledge Base Reasoning

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI