Challenges of Data Refining Process during the Artificial Intelligence Development Projects in the Architecture, Engineering and Construction Industry

Heo, Seokjae; Han, Sehee; Shin, Yoonsoo; Na, Seunguk

doi:10.3390/app112210919

Open AccessArticle

Challenges of Data Refining Process during the Artificial Intelligence Development Projects in the Architecture, Engineering and Construction Industry

Department of Architectural Engineering, College of Engineering, Dankook University, 152 Jukjeon-ro, Yongins-si 16890, Gyeonggi-do, Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2021, 11(22), 10919; https://doi.org/10.3390/app112210919

Submission received: 30 October 2021 / Revised: 12 November 2021 / Accepted: 15 November 2021 / Published: 18 November 2021

(This article belongs to the Topic Artificial Intelligence (AI) Applied in Civil Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

The paper examines that many human resources are needed on the research and development (R&D) process of artificial intelligence (AI) and discusses factors to consider on the current method of development. Labor division of a few managers and numerous ordinary workers as a form of light industry appears to be a plausible method of enhancing the efficiency of AI R&D projects. Thus, the research team regards the development process of AI, which maximizes production efficiency by handling digital resources named ‘data’ with mechanical equipment called ‘computers’, as the digital light industry of the fourth industrial era. As experienced during the previous Industrial Revolution, if human resources are efficiently distributed and utilized, no less progress than that observed in the second Industrial Revolution can be expected in the digital light industry, and human resource development for this is considered urgent. Based on current AI R&D projects, this study conducted a detailed analysis of necessary tasks for each AI learning step and investigated the urgency of R&D human resource training. If human resources are educated and trained, this could lead to specialized development, and new value creation in the AI era can be expected.

Keywords:

digital light industry; fourth Industrial Revolution; artificial intelligence; human resource development; work index; architecture; engineering and construction industry

1. Introduction

For half a century, entrepreneurs of South Korea have transformed the originally agriculture-focused country into one with a focus on light and heavy industries, and following the third Industrial Revolution era, South Korea gained a state-of-the-art electronic industry. Furthermore, some South Korean corporate companies that emerged after the 2000s are now top ranking on the global scale. Not only have they established artificial intelligence (AI) research and development (R&D) centers, but they have also aggressively developed professional human resources in order to become global leading companies in the fourth Industrial Revolution era. Notably, amongst construction and transportation industries, the AI-based indoor mapping and positioning technologies developed by Naver Labs are acknowledged as top-tier technologies [1], and KAKAO BRAIN has developed a technology that allows world-class performance of learning of images without the image labeling process [2,3]. According to King et al. [4], AI will soon be applied to all the industrial sectors around the globe at a fast pace.

The South Korean government-initiated Data Dam project focused on digital infrastructure investment since the mid-2020s [5]. The South Korean government announced that the Data Dam project would have a total budget of KRW 292.5 billion (equivalent to approximately USD 252 million) for the first half of 2021 and is planning to collect the training data from 84 areas, including vision, geographic information, healthcare, and construction. The aim of this project is to enable research institutes and private companies to focus on AI-related tasks, and the government will develop and establish AI learning datasets, which will take approximately 80% of the total development process time and lead to a delay in AI R&D projects. Consequently, it is expected that this project will have a positive effect on the creation of mass employment as well as on innovations in AI research.

Figure 1 depicts a schematic diagram of data acquisition, refinement, and verification during the execution of AI R&D projects. The set of data in the AI R&D project could be divided into one using image-based data and another using time series data. The overall process of constructing a dataset goes through the process of raw data acquisition, segmentation, and labeling. In general, after collecting the raw data, the refinement process to adjust the size and shape and the labeling process, including annotation and segmentation, are carried out to create the learning dataset based on the two types of data. Additionally, the refined data are re-classified into the weighted model referred to as the brain of AI. The final process of AI dataset construction is quality control to determine what to use in order to ensure the production of optimum results [6,7,8,9,10,11,12,13,14,15,16,17,18,19]. While most previous studies divided the construction of the dataset process into the acquisition of raw data, pre-conditioning, data refinement, data labeling, and the composition of the dataset, the process can be simplified into data acquisition, refinement, and labeling steps, excluding the pre-conditioning and composition of the dataset. This is because AI R&D is conducted not by academia but by industrial processes once the data are acquired, and labeling work is carried out to refine the procedure for cost and time reduction. Moreover, subdivided processes performed at research institutes are not followed, as pre-developed models are chosen rather than the development of models for learning on their own, and they swiftly undergo refinement and verification after acquiring data, such as suitable images or videos [20,21]. Verification stated here refers to a process that confirms whether refinement was appropriately carried out by utilizing manpower or automation tools, not forming a validation set generally cited in AI research.

It seems that AI R&D could be carried out easily and fast if the above procedures are followed, but there remain two factors to consider in AI research. Firstly, a great deal of high-quality research manpower and equipment are needed at the learning level. In order to verify the performance of training models, numerous variable studies need to be carried out, and researchers who can understand the model structure and change equipment to reduce the training time, as well as manpower to revise labeling, are required. Moreover, a number of studies compared the performance of various AI algorithms in order to verify the optimal model for the construction projects [22,23,24] For example, Chakraborty et al. [23] tested six different types of machine learning algorithm for predicting the construction costs. Currently, South Korea lacks high-quality manpower that would facilitate a proper understanding and establishment of AI structures, and available personnel are concentrated only in some specific fields. Notably, in minor fields, such as construction and civil engineering, the demand for R&D continues to emerge, but their progress is slow due to a lack of human resources. Problems with development equipment also exist. Hardware, particularly the number of GPUs, is important in the development of the latest AI models [25]. AI calculation techniques using many GPUs in parallel exist, and, hence, the higher the quantity, the easier the variable research. However, arithmetic disciplines researching minor fields such as construction and civil engineering seem to conduct research with equipment with a budget of USD 4000~10,000 as of 2020 (as shown in Table 1), and they also use services provided by Google or Amazon without equipment [26,27]. As discussed above, AI research may be delayed due to the low performance of computing equipment in non-mainstream sectors.

Secondly, more manpower would be required upon data acquisition and refinement than in the training level. There are limits to searching on the web or reusing existing resources for the acquisition of data, and for special fields, procuring training data itself may be difficult. For instance, in order to acquire data on structure damage due to disasters such as earthquakes and typhoons, research teams must use their time and financial resources at appropriate times and acquire data directly on site. If they are lucky enough to acquire training data after this, refinement is necessary to study the latest AI models, and this requires revision of each data item with human interference. Notably, data refinement that requires expertise may be delayed due to limited manpower.

Although companies carry out AI research by focusing on learning rather than data acquisition and refinement for cost reduction, the outcome is imperfect, as they are unable to procure satisfactory training data. Luckily, South Korea is investing a considerable amount not only in AI model development research but also data acquisition and refinement. In February 2021, the National Information Society Agency (NIA) attempted to standardize the development method by publishing a manual on building data and refinement [28,29]. However, the types of data to be acquired consisted of images and video, and simply increasing the quality without considering the development purpose was prioritized; as a result, it is difficult to utilize this model in methods other than the original acquisition purpose. If these data are meaningfully used later, the problem of re-refining the source data by using manpower and financial resources again occurs. Therefore, the aim of this study is to discuss the manpower-dependent problem of the current image-based AI research, its improvement, and its future.

2. Current Research on Data Acquisition, Refinement, and Quality Evaluation

2.1. AI Research Imitating Human Visual Information

In general, an ordinary image or a cut-frame image from a video clip is a good material for AI to learn. As shown in Figure 2, until 2008, during which the early stage of data acquisition development occurred, AI R&D was conducted in terms of simple image classification. During this period, data acquisition did not require many human resources to create training data since labeling consisted of simply adding text labels to images. Later, object detection was developed, which made it possible to enhance the accuracy of AI. In object detection, it is an essential task to annotate images, indicating the parts needed to learn from the images [7]. During the early stage of annotation, the data used round boxes for image classification, but efficiency was enhanced following the development of the ability to identify several objects on one image, as shown in Figure 3. Nonetheless, if the image classification data are used without considering annotation, as shown in Figure 3, the advantages of annotation cannot be exploited, which leads to no significant changes in the accuracy of AI. Consequently, researchers developing object detection had to acquire new image data corresponding to the method shown in Figure 3b, and they had difficulties in reconducting atypical refinement accordingly with the AI model to be applied.

In 2012, researchers who attempted to enhance the accuracy of an AI model developed instance segmentation, a type of object detection that categorizes classes per pixel. Most researchers nowadays apply this method, and it shows excellent performance [6,7,8,9,10,11,12,13,14,15,16,17,18,19,30,31,32,33,34]. Since instance segmentation specifies parts on the image needed for learning by pixel, labeling is mostly performed in polygonal form, as shown in Figure 4d. This case requires more workload than required by the existing round boxing. To make up for this, the so-called auto-labeling methods started to be developed, ultimately reaching the level where labeling is carried out on parts without the need to click on certain features, such as trees and cars [35]. Nevertheless, many errors occurred, so revision by personnel was needed, and this required additional acquisition and learning on images in order to perfectly recognize the target. An additional amount of acquisition differs in terms of target accuracy, and figures may vary depending on the engineer. Acquisition methods are ramified as online collection through web surfing and offline collection by shooting [29]. Online collection is carried out by conducting searches on the image server of the web established by people or by conducting automatic web searches through surfing. Due to problems regarding copyrights, the industry prefers offline acquisition, and R&D institutes combine on- and off-line methods to reduce the development schedule and costs.

The acquired data are called source data and are in an incompatible state for AI training at the current status quo. After conducting data cleansing on the source data needed for training, the data must be revised to correspond with the model to be developed, and the task incurred here is called data refinement. In brief, refinement includes various steps, from creating a category folder and classifying images to adjusting the image size, changing the resolution, blurring for privacy, and binarizing to simplify color, and the images go through labeling processes such as image boxing to indicate the target object on the image, and segmentation. All tasks have a proper program tool that is not yet automatized, because the tasks cannot be immediately completed. The most frequently used tasks during refinement are changing the image size, adjusting the resolution, and changing the name of the file; labeling can be carried out again after these tasks are completed. During these procedures, manpower is constantly deployed. Furthermore, some cases may require expert knowledge on refinement and labeling depending on the research field, such as construction, civil engineering, and medical care. However, research results of AI in specific fields show low performance or are delayed because highly educated personnel are costly and limited.

2.2. Research on Quality Assurance of Acquired Data and AI Models

Quality assurance can be categorized into the tasks of extracting appropriate data for training to enhance the accuracy of AI models and conducting a performance evaluation of the developed AI models. [28] First, performance evaluation indicators on the classified model, such as accuracy, confusion matrix, precision, recall, F1 Score, ROC AUC, and mAP, are developed based on a method of judging the similarity between the actual and estimated data. In 2012, during which the so-called embryonic stage of AI occurred, quality assurance of training data was considered unimportant since the focus was only on the mechanical training model, but the necessity of quality evaluation rose as the amount of data continued to increase, and the importance of refinement and labeling was ascertained. If quality evaluation is included as a general business process, quality evaluation could be carried out by at least those who have knowledge on AI development, which infers that AI engineers would additionally have to bear the quality evaluation task of data before the training period. According to recent research, there are tendencies that prove the accuracy enhancement of the model only with a quality training data set, which would solve if research on training data evaluation indicators are done and the related manpower are educated from now on, but lack of AI development personnel would rise as a problem in the short term.

3. What Is the Digital Light Industry?

3.1. Manpower-Dependent Source Data Acquisition Method

AI studies are currently being actively carried out in many fields, but the interest in acquiring training data is relatively small. Acquisition is mostly divided in collection via direct photography or video-shooting and collection via web surfing. Many research engineers are mistaken on the notion that the data acquisition process is not difficult and unimportant due to the incorrect assumptions that infinite data on the Internet can be used and that all objects in the surrounding environment are usable references. In fact, in the case of specialized fields, in order to acquire a datum, bachelor’s degree knowledge and verification in accordance with it are needed. Even if one is fortunate enough to acquire the data, the specialist limit must be considered. In the medical field, for example, where there are numerous clinical cases, effort must be made to prevent leakage of personal information when acquiring image data of wounds, conditions, etc. In the case of the construction field, attention should be paid to not intrude on private property, and all other fields should give attention to copyrights, personal information leakage, and security on the image and acquired data. Thus, although acquisition can be smoothly carried out, blurring parts containing personal information or deleting data violating copyrights from the collected source data in order to abide by R&D ethics regulations should be conducted, which can only be done manually by professionals. If the data are to be used for commercial purposes, much more detailed work is needed.

The digital surfing technique using search engines may appear easy and simple compared to the direct acquisition method, but its usage is being reduced by professional engineers due to its many downsides. In order to transform data into source data after acquisition, inspection of each datum is needed due to overlapping or irrelevant data, format variation, resolution quality, and copyright issues, and research on this subject is required just as much as it is on direct acquisition. Many current beginners who are acquiring AI data believe that numerous training data exist on the web and prefer web-based data acquisition, but when a certain quantity of data is acquired, they become aware of this method’s limit and eventually turn their research toward direct acquisition. However, both methods are means of data acquisition that cannot be overlooked, and a considerable amount of manpower and financial resources are needed in this process due to its trial-and-error nature.

3.2. Necessity for Professional Manpower in Data Processing/Refinement/Labeling

As stated in the previous section, processing and refinement of image data are generally conducted manually using manpower as shown in Table 2. If categorizing the acquired data is considered processing and refining, those in charge of the database should classify images one by one at this step. The next steps, namely, cleansing, size transformation, binarizing, and naming, can be solved with a simple program code of the programmer, but the database manager can individually implement the procedures if program personnel are not present. Round boxing and segmentation, which are conducted to utilize the latest AI method, may require professional knowledge depending on the data type, and they may be inefficient in terms of cost if all manpower is gentrified. This issue can be solved by using a method in which a manager with professional knowledge has many ordinary workers, followed by processing and refinement.

During this step, a special program for labeling may be needed. Non-profit research institutes mostly use refinement programs based on open-source software, such as ‘Labelme’ [36] or ‘VGG annotator’ [37], whereas business entities utilize self-developed programs and require manpower to manage these. Lastly, in order to manage refined data and utilize them for learning purposes, manpower for professionally developing AI is needed, and specialists with knowledge of at least five knowledge standards in data processing, refinement, and labeling are required, as in Table 3. It can be clearly observed that programs that automatically assist labeling are released as technology development progresses [35], and their usages are expected to continue to increase in diversity. However, at this point, human interference must not occur in any of the steps from data acquisition and processing to refinement, and it could be inferred that the AI R&D industry is highly human dependent.

3.3. Acquisition/Process Methods for Low Versatility Data

In general, AI development is not carried out with one-time data refinement. To reach the level of commercialization, training data inevitably have to go through several passes of re-processing, refinement, and labeling, and uneven data management may occur due to the participation of numerous workers. There are three main reasons for re-processing and refinement: the first reason is labeling being carried out in a manner that is not in accordance with the engineer and data managers’ instructions, even if it is performed by professionals. Here, a series of refinements should be conducted again in accordance with the development purpose. The second reason is low accuracy during the training period or when the classified model categorizes unwanted objects. Unlike the problem occurring during model training, the cause cannot be determined in this case, and re-processing and refinement of overall data may be required, which greatly affects the project schedule. This could be due to low precision in the refinement and labeling process (labeling unnecessary data and inserting different data from the development intention) or lack of training data figures. The third reason is the development of a new AI network. Since the training data are composed of a bundle of categories, it is not difficult to add or delete additional categories, but it is difficult to use the bundled data, as they are for the development of AI with a new purpose. For instance, construction material could be a product on-site or rubbish in a dumpsite depending on its usage. Although AI perceiving materials has been successfully developed, it should be able to differentiate between new and used products for resource management, and, in this case, the existing category should be revised. That is, as human beings see and judge, data refinement should be carried out accordingly so as to discern other categories depending on the situation, even if the product is the same. The results of the research team’s trial suggest that the workload lessened considerably when newly labeling from the source data rather than editing the existing labeled data. Research and projects on AI that are currently being developed do not consider subcategories such as the issues above, and as they collect and refine training data, such data are not expected to be used in further developments and would ultimately be considered waste.

3.4. Necessity of Acquisition/Refinement/Labeling Manager

The acquisition/refinement/labeling manager could be defined as the individual who possesses professional knowledge on the specialized field and is capable of making decisions regarding the use of data in training. The advantages gained from hiring such personnel are a shortened development schedule and increased accuracy. If the performance of the model trained with refined data in the absence of a manager is low, it is difficult to find where the cause lies among the training material and AI network. In this case, the AI engineers have to perform the task to identify errors within training data and to improve the network simultaneously, thus leading to the research development schedule being prolonged by twice the amount of time. On the other hand, if the data refinement manager has knowledge of data quality and provides refined data, the AI engineer can concentrate solely on developing the AI network, which may allow for a reduction in R&D time.

4. Solutions

As discussed in Section 3, many human resources are required for AI R&D. For AIs in specialized fields, the accuracy of the model can only be ensured by highly educated personnel, but such manpower is usually lacking. Moreover, the acquisition and refinement of data, which correspond to the immediate development process but with no consideration in expansion, are used only once in a manner similar to disposable products. In order to avoid such losses, it is necessary to train human resources to consider and manage the expansion from the beginning of the development.

4.1. Solution to the Manpower-Dependent R&D Method

The majority of people in South Korea attend university, and their computer skills are the world’s best. Hence, general human resources required for AI R&D can be considered abundant. However, AI R&D is currently being carried out in an inefficient manner due to notable manpower shortage, and this issue will only be resolved when AI is popularized, which represents the beginning step of the fourth Industrial Revolution. Moreover, Korea is expected to be able to quickly become a powerhouse in this field, as the most efficient R&D is currently being conducted in this country.

Manpower-related issues can be solved by an AI expert selecting a detailed learning method, data acquisition, refinement, and labeling. Although it is ideal to plan for the consideration of further extendibility, an expert cannot do this alone and, thus, requires AI personnel and an additional general manager. In order to solve this problem, a new type of manpower distribution for the AI development process is suggested, as shown in Figure 5. The structure and figures of manpower distribution are empirical values obtained by the research team. Aside from their own expertise, general managers play a supporting role, ensuring that R&D is successfully processed by performing tasks in relation to their understanding of the scheme of AI professionals and managing manpower. Should there be insufficient supporting human resources, acquisition, refinement, and labeling managers need to be chosen through professional training and administering manpower, and AI engineers should provide assistance in focusing on enhancement in the capacity of the model. Ultimately, no differences in terms of division could be observed during the Industrial Revolution, and because productivity has dramatically risen as a consequence, AI R&D is expected to produce the same results. For this, large scale learning programs and support are needed for ordinary researchers to become advanced ones through proper education.

4.2. Solution to Concerns Regarding Data Waste

The number of images per project acquired by the Korean government’s Data Dam establishment project is about 200 k~500 k. The cost of this is around KRW 1.8 billion. The likelihood of using these data for other development purposes should be up for reconsideration.

According to a case study of the research team, the same amount of effort required for the existing refined quantity was also required even when adding one simple classification category. Regardless of the selected AI development methods, new category classification and labeling methods should be applied to the model to be developed, and even datasets developed after the consideration of extendibility require half the amount of the pre-existing manpower. Therefore, data should be acquired and refined so as to possess generality by establishing detailed categories, organizing data with extendibility reflected, adopting labeling techniques, starting with a broad development range from the beginning, and reducing cost and time losses.

5. Case Study and Suggestion Regarding Work Index

Three case studies were carried out to support the research team’s argument that AI R&D is similar to a type of light industry. The case studies were carried out in a similar manner to that of the development method performed by general researchers, and the manpower and hours consumed per development step can be observed in Table 4. Equation (1) presented below is suggested with image quantity, input manpower, and work hours.

Work index = \frac{Total amount of data}{Degree of input manpower \times Work hours}

(1)

Through this equation, the degree of resource consumption in comparison with the data can be quantitatively calculated as a value from 1 to 10 depending on the development method of AI. If a high amount of manpower and hours are input, the index shows a low value; if data generated from other further R&D processes are analyzed, the adequacy of the resource input for data acquisition, refinement, and labeling development can be verified.

5.1. Development on Concrete Surface Crack Recognition

In order to use the object detection method, a total of 20,000 photos showing concrete cracks in 224 × 224 pixels were collected. Concrete cracks were acquired by photographing actual architecture, and the cracks on the image were cropped and saved. It took 200 h under one worker/hour to complete data acquisition and cropping. Special labeling was not required; thus, the work index was 1.4 when compared to a considerable amount of learning data. As a result of learning, the AI models could perceive cracks. While acquiring learning data, the images were amassed around horizontal or vertical cracks, so diagonal cracks were identified, but those intersecting in X were unrecognizable.

5.2. Development in Perceiving Constructional Waste

In order to develop an AI model that differentiates 5 out of 18 constructional wastes being generated from construction sites, namely, concrete, brick, lumber, board, and mixed waste, data were acquired and refined through instant segmentation. A total of 866 images were acquired, and those taken on-site were cropped, collected online at a resolution of 512 × 512 pixels, and refined to 100 kb per photo. Moreover, pixels around the object boundaries were extracted using the polygonal method so as to differentiate the classes for wastes on the images. It took 48 h under three workers/hour to conduct data acquisition, and this time includes all aspects from planning the data acquisition process to taking pictures on-site as well as moving and saving the pictures on a data server. Refinement and labeling were simultaneously carried out, which took 180 h under four workers/hour. The work index for each development step was different, as shown in Table 4, and it could be observed that many resources were used for data acquisition in this research. Since inaccuracy was observed in some parts after learning, around 60 h under four workers/hour were additionally required for relabeling (segmentation) based on the acquired source data. Relabeling required only one-third of the resources needed to complete the previous work, and it was possible to differentiate five types of wastes after learning with the relabeled data, but it was ultimately decided to increase the amount of learning data due to cases in which board and lumber were being confused for each other.

5.3. Development of Model Identifying the Quantity of Rebar

A model to identify individual bundles of rebar was developed to manage automatized materials on a construction site. Within images of 512 × 512 pixels, 50–250 bundles of rebar were photographed, and the cross-section of rebar was extracted using the polygonal method so as to perceive each bundle of rebar separately. It took 48 h under three manpower/hour to acquire 726 images, and it took 100 h under four manpower/hour to separately segment around 120 k rebars. As a result of learning, the extraction of rebar quantity from the photo, which was achieved by identifying each rebar cross-section individually, was confirmed. However, if more than 300 bundles of rebar were included in one image, some were difficult to differentiate on 512 × 512 pixels, even with the naked eye; thus, a notable decrease in accuracy was observed in some parts with low resolution. This case study had the tendency to a show high work index compared to that of others, and this is considered to be due to no additional labeling being conducted and the development being efficiently carried out by a skilled research team.

6. Conclusions

The fourth Industrial Revolution refers to an intellectualized society with various technologies, such as AI, big data, block chains, and robotics. This study focused on AI among technologies in the fourth Industrial Revolution and examined the development process’s similarities to that of light industry. Compared with the heavy industry, the production efficiency of the light industry is dependent upon manpower and production stages. All processes for AI R&D, from data acquisition to refinement, learning, and quality evaluation, need to be carried out by personnel, an observation that is also highly applicable to the light industry. However, since previous business entities/institutes processed development by only focusing on the AI learning step, manpower supply was not prioritized, which led to low assessment of the importance of human resources in national R&D projects, thus leading to a reality with improper AI R&D. Therefore, this study conducted a detailed analysis of necessary tasks for each AI learning step and examined the urgency of R&D personnel training.

6.1. Data Acquisition Step

Previously, images or time series data were simply copied into the digital form for data acquisition. However, for R&D with commercialization purposes or being conducted in specific professional fields, it is difficult to indiscriminately use existing data due to copyrights and personal information protection, and a process to check the data individually after acquisition is mandatory. A tool to automatize this process exists but cannot be considers as completed; thus, human interference is necessary throughout all processes.

6.2. Data Refinement/Labeling Step

It was verified through a case study that the accuracy of the AI model and reduction in the research period were expectable only with the enhancement of refinement technology. It was observed that a refinement manager and developing personnel who possess academic knowledge of development field were necessary, and it was important that the data were refined by the personnel who received professional education on development rather than by the public recruited through the existing cloud sourcing method. In addition, a refinement manager should be hired to manage R&D projects, and AI engineers should be able to focus on their own tasks, thereby leading to maximized efficiency of labor division.

6.3. Quality Evaluation Step

According to the research results, it appears that managers could carry out quality evaluation for each step. This could lead to quality evaluation focused only on the research purpose, and reutilization of learning data would be impossible. Although it is inessential at this point, reutilization of learning data could serve as a shortcut to reduce the development period for researching and developing similar AI when the golden age of AI research is reached. Hereupon, the necessity of professional manpower to predict development in other fields and manage data would become important.

If a sufficient number of personnel are educated and trained to lead specialized development projects, as per the results above, new value creation in the AI era can be expected. Such an AI industry would increase job opportunities in the short term, and job opportunities in general are not considered to decrease due to AI until a particular point is reached. Labor division and development of mechanisms that emerged during second Industrial Revolution period brought explosive productivity enhancement. The fourth and second Industrial Revolutions show similarities in terms of natural resources (data) and mechanisms (computer). A considerable number of examples verified the increased productivity achieved by using AI, and if human resources are efficiently distributed and utilized, advancement in mankind similar to that observed during the second Industrial Revolution is expected.

Author Contributions

Conceptualization S.H. (Seokjae Heo) and S.N.; methodology S.H. (Sehee Han), Y.S. and S.N.; formal analysis S.N., S.H. (Seokjae Heo), and S.H. (Sehee Han); resources S.H. (Sehee Han); project administration S.N.; resource, S.H. (Seokjae Heo) and Y.S.; funding acquisition S.H. (Seokjae Heo); writing—original draft S.N. and Y.S.; writing—review and editing, S.N. and Y.S.; visualization: S.H. (Seokjae Heo) and Y.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government Ministry of Education (No. NRF-2019R1A6A301091459 and NRF-2020R1C1C1005406).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used to support the results in this study are included within the article. In addition, some of the data in this research are supported by the references mentioned in the article. If you have any queries regarding the data, the data will be available from the correspondence of the research upon request.

Conflicts of Interest

We declare that there are no conflict of interests among all the authors of this research.

References

Revaud, J.; Heo, M.; Rezende, R.S.; You, C.; Jeong, S.-G. Did it change? Learning to detect point-of-interest changes for proactive map updates. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
Kim, B.; Lee, J.; Kang, J.; Kim, E.-S.; Kim, H.J. HOTR: End-to-End Human-Object Interaction Detection with Transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021. [Google Scholar]
Roh, B.; Shin, W.; Kim, I.; Kim, S. Spatially consistent representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021. [Google Scholar]
King, T.M.; Arbon, J.; Santiago, D.; Adamo, D.; Chin, W.; Shanmugam, R. AI for testing today and tomorrow: Industry perspectives. In Proceedings of the 2019 IEEE International Conference On Artificial Intelligence Testing (AITest), Newark, CA, USA, 4–9 April 2019. [Google Scholar]
NIA National Information Society Agency. NIA2020-032. 2020. Available online: https://www.nia.or.kr/ (accessed on 27 September 2021).
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Felzenszwalb, P.; McAllester, D.; Ramanan, D. A discriminatively trained, multiscale, deformable part model. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008. [Google Scholar]
Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015. [Google Scholar]
He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In European Conference on Computer Vision; Springer: Dordrecht, The Netherlands, 2014. [Google Scholar]
Lowe, D.G. Object recognition from local scale-invariant features. In Proceedings of the Seventh IEEE International Conference on Computer Vision, Corfu, Greece, 20–27 September 1999. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 91–99. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Tan, M.; Le, Q. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017. [Google Scholar]
Kim, J.M.; Hyeon, S.G.; Chae, J.H.; Do, M.S. Road Crack Detection based on Object Detection Algorithm using Unmanned Aerial Vehicle Image. J. Korea Inst. Intell. Transp. Syst. 2019, 18, 155–163. [Google Scholar] [CrossRef]
Lee, B.-Y.; Yi, S.-T.; Kim, J.-K. Surface crack evaluation method in concrete structures. J. Korean Soc. Nondestruct. Test. 2007, 27, 173–182. [Google Scholar]
Abioye, S.O.; Oyedele, L.O.; Akanbi, L.; Ajayi, A.; Delgado, J.M.D.; Bilal, M.; Akinade, O.O.; Ahmed, A. Artificial intelligence in the construction industry: A review of present status, opportunities and future challenges. J. Build. Eng. 2021, 44, 103299. [Google Scholar] [CrossRef]
Chakraborty, D.; Elhegazy, H.; Elzarka, H.; Gutierrez, L. A novel construction cost prediction model using hybrid natural and light gradient boosting. Adv. Eng. Inform. 2020, 46, 101201. [Google Scholar] [CrossRef]
Elhegazy, H.; Chakraborty, D.; Elzarka, H.; Ebid, A.M.; Mahdi, I.M.; Haggag, S.Y.A.; Rashid, I.A. Artificial Intelligence for Developing Accurate Preliminary Cost Estimates for Composite Flooring Systems of Multi-Storey Buildings. J. Asian Archit. Build. Eng. 2021, 1–13. [Google Scholar] [CrossRef]
Baji, T. Evolution of the GPU Device widely used in AI and Massive Parallel Processing. In Proceedings of the 2018 IEEE 2nd Electron Devices Technology and Manufacturing Conference, Kobe, Japan, 13–16 March 2018. [Google Scholar]
Na, S.; Heo, S.-J.; Han, S. Construction Waste Reduction through Application of Different Structural Systems for the Slab in a Commercial Building: A South Korean Case. Appl. Sci. 2021, 11, 5870. [Google Scholar] [CrossRef]
Seok-jae, H.; Dae-ho, M.; Bang-yeon, L.; Sang-hyun, L. Detection of Plane Member Deformation using Moire Fringe and Recognition using Machine Learning. J. Wind. Eng. Inst. Korea 2018, 22, 25–32. [Google Scholar]
Telecommunications Technology Association. AI Learning Data Quality Management Guidelines v1.0; Ministry of Science and ICT: Gwacheon, Korea, 2021.
Telecommunications Technology Association. AI Dataset Construction Guidebook; Ministry of Science and ICT: Gwacheon, Korea, 2021.
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014. [Google Scholar]
Tuzel, O.; Porikli, F.; Meer, P. Region covariance: A fast descriptor for detection and classification. In Proceedings of the 9th European Conference on Computer Vision, Graz, Austria, 7–13 May 2006. [Google Scholar]
Lazebnik, S.; Schmid, C.; Ponce, J. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, New York, NY, USA, 17–22 June 2006. [Google Scholar]
Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–25 June 2005. [Google Scholar]
Viola, P.; Jones, M. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Kauai, HI, USA, 8–14 December 2001. [Google Scholar]
Acuna, D.; Ling, H.; Kar, A.; Fidler, S. Efficient interactive annotation of segmentation datasets with polygon-rnn++. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
Russell, B.C.; Torralba, A.; Murphy, K.P.; Freeman, W.T. LabelMe: A database and web-based tool for image annotation. Int. J. Comput. Vis. 2008, 77, 157–173. [Google Scholar] [CrossRef]
Dutta, A.; Gupta, A.; Zissermann, A. VGG Image Annotator (VIA); University of Oxford: Oxford, UK, 2016. [Google Scholar]

Figure 1. The current AI R&D process in academia and in the industrial field, and manpower status required for development (if 10 personnel are needed for total input, 50% of them are for data acquisition, processing, refinement, and labeling).

Figure 2. Milestones of object detection techniques and the evolution of hardware (GPU).

Figure 3. Difference between (a) an example of learning data collected in the era when image classification techniques were dominant and (b) learning data collected after object detection was developed. (a) Data can also be used for object detection, but they are less efficient and may lead to inaccurate results.

Figure 4. (a) Classification (0 click); (b) Classification + Localization (0 click); (c) Object Detection (8 clicks); (d) Instance Segmentation (58 clicks). A typical artificial intelligence method to perceive images. The number in the parentheses represents the number of mouse clicks. License plates and human faces were mosaicked or blurred for privacy. Following the development of the ability to perceive various objects on one image, the number of mouse clicks as well as the refinement time increased, whereas the volume of the training dataset decreased, leading to quickened learning speed and, thus, the need for comparative studies on this topic.

Figure 5. The current R&D method for the human resource structure used in the AI development process and the improved manpower distribution method proposed by the researchers.

Table 1. Status and cost of AI research equipment for architectural and civil engineering studies at universities in South Korea (2020).

University	Major	Amount of Manpower	Hardware	Cost
D	Civil Engineering	3	RTX3080 × 4	USD 12,000
D	Environmental Engineering	3	RTX2090 × 3	USD 6000
S	Environmental Engineering	5	RTX2060 × 4	USD 4000
H	Construction	2	Google API	USD 10/month

Table 2. Parts to be considered for verification in data acquisition and cleansing.

Parts to Verify	Processing Method
Copyright infringement	Data deletion
Personal information infringement	Blur image, delete personal information
Security facilities and public facilities	Acquire data upon prior permission
Discrimination against a particular region, society, or race	Set up procedure to avoid inequality
Possibility for safety-related accident outbreak	Prior check on accident risks and provide safety education

Table 3. Research process, human resources, and level of academic knowledge required for AI research and development prior to learning.

Step	Occupational Group	Level of Academic Knowledge
Data acquisition	DB manager	Low level of computer programming major
Data acquisition	Worker	Major not required
Data refinement	Programmer	General level of programming major
Data labeling (annotation)	Labeling Manager	High level of special field major
Data labeling (annotation)	Worker	Major not required
Refinement/labeling program development	Programmer	High level of programming major
Training data transformation	AI Engineer	High level of AI major

Table 4. ‘Work index’ on the basis of required manpower and hours depending on the AI R&D method.

Case	Method	Amount of Data	Step	Manpower /Hour	Working Time	Metrics per Data	Avg.
A. Crack	Classification	20,000	Acquisition/labeling	1	200	1.4	1.4
B. Construction waste	Instant segmentation	866	Acquisition	3	48	6.0	3.3
			Labeling	4	180	1.2
			Relabeling	4	60	3.6
C. Rebar		726	Acquisition	3	48	5.0	4.1
C. Rebar		726	Labeling	4	110	1.65	4.1

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Heo, S.; Han, S.; Shin, Y.; Na, S. Challenges of Data Refining Process during the Artificial Intelligence Development Projects in the Architecture, Engineering and Construction Industry. Appl. Sci. 2021, 11, 10919. https://doi.org/10.3390/app112210919

AMA Style

Heo S, Han S, Shin Y, Na S. Challenges of Data Refining Process during the Artificial Intelligence Development Projects in the Architecture, Engineering and Construction Industry. Applied Sciences. 2021; 11(22):10919. https://doi.org/10.3390/app112210919

Chicago/Turabian Style

Heo, Seokjae, Sehee Han, Yoonsoo Shin, and Seunguk Na. 2021. "Challenges of Data Refining Process during the Artificial Intelligence Development Projects in the Architecture, Engineering and Construction Industry" Applied Sciences 11, no. 22: 10919. https://doi.org/10.3390/app112210919

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Challenges of Data Refining Process during the Artificial Intelligence Development Projects in the Architecture, Engineering and Construction Industry

Abstract

1. Introduction

2. Current Research on Data Acquisition, Refinement, and Quality Evaluation

2.1. AI Research Imitating Human Visual Information

2.2. Research on Quality Assurance of Acquired Data and AI Models

3. What Is the Digital Light Industry?

3.1. Manpower-Dependent Source Data Acquisition Method

3.2. Necessity for Professional Manpower in Data Processing/Refinement/Labeling

3.3. Acquisition/Process Methods for Low Versatility Data

3.4. Necessity of Acquisition/Refinement/Labeling Manager

4. Solutions

4.1. Solution to the Manpower-Dependent R&D Method

4.2. Solution to Concerns Regarding Data Waste

5. Case Study and Suggestion Regarding Work Index

5.1. Development on Concrete Surface Crack Recognition

5.2. Development in Perceiving Constructional Waste

5.3. Development of Model Identifying the Quantity of Rebar

6. Conclusions

6.1. Data Acquisition Step

6.2. Data Refinement/Labeling Step

6.3. Quality Evaluation Step

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI