Analytical Evaluation of Midjourney Architectural Virtual Lab: Defining Major Current Limits in AI-Generated Representations of Islamic Architectural Heritage

Sukkar, Ahmad W.; Fareed, Mohamed W.; Yahia, Moohammed Wasim; Abdalla, Salem Buhashima; Ibrahim, Iman; Senjab, Khaldoun Abdul Karim

doi:10.3390/buildings14030786

Open AccessArticle

Analytical Evaluation of Midjourney Architectural Virtual Lab: Defining Major Current Limits in AI-Generated Representations of Islamic Architectural Heritage

by

Ahmad W. Sukkar

^1,*

,

Mohamed W. Fareed

¹

,

Moohammed Wasim Yahia

¹

,

Salem Buhashima Abdalla

¹

,

Iman Ibrahim

²

and

Khaldoun Abdul Karim Senjab

³

¹

Department of Architectural Engineering, College of Engineering, University of Sharjah, Sharjah P.O. Box 27272, United Arab Emirates

²

Department of Applied Design, College of Fine Arts and Design, University of Sharjah, Sharjah P.O. Box 27272, United Arab Emirates

³

Department of Computer Science, College of Computing and Informatics, University of Sharjah, Sharjah P.O. Box 27272, United Arab Emirates

^*

Author to whom correspondence should be addressed.

Buildings 2024, 14(3), 786; https://doi.org/10.3390/buildings14030786

Submission received: 30 November 2023 / Revised: 1 January 2024 / Accepted: 29 January 2024 / Published: 14 March 2024

(This article belongs to the Section Construction Management, and Computers & Digitization)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

In artificial intelligence (AI), generative systems, most notably Midjourney, have tremendous power to generate creative images of buildings and sites of Islamic architectural heritage through text-to-image generation based on the internet. The AI-generated representations have significant potential for architects, specialists, and everyday users. However, the system has considerable limitations when generating images for some buildings and sites where the representations appear too far from their original represented structures. Evaluating Midjourney as an architectural virtual lab, this research article attempts to define the major current limitations of using Midjourney’s AI system in producing images similar to the actual buildings and sites of Islamic architectural heritage. This research employs prompt engineering techniques based on historical sources as inputs to examine the accuracy of the output of the AI-generated images of selected examples of structures of the Islamic tradition. To validate the research results, it compares the Midjourney output with the original look by employing direct observation through critical analysis of human intelligence (HI) facilitated by the analysis of the latest version of 3M Visual Attention Software and an online survey. It concludes that while Midjourney can represent high-end AI-generated images inspired by the Islamic tradition, it currently falls short of presenting the actual appearance of some of its original structures. Furthermore, it categorizes these limitations into four groups: (1) limits of the prompt, (2) limits of fame, (3) limits of regionality and historical styles, and (4) limits of architectural elements and details. The significance of this research lies in providing timely evaluation factors to architectural heritage practitioners and software developers. This practical article is the second in a series of two sequential articles in the Buildings journal; the first (theoretical) article introduces the concept of Artificial Intelligence Islamic Architecture (AIIA), answering the question: what is Islamic architecture in the age of artificial intelligence?

Keywords:

religious Islamic architecture; artificial intelligence (AI, ChatGPT); visual; intangible; text-to-image generation; Midjourney; verification, authenticity, architectural history and theory; cultural heritage management

1. Introduction

(This practical article continues a discussion in a previous theoretical article [1]. Reading the theoretical article first is recommended; however, this introduction provides enough analysis to make this article self-exploratory.)

Many heritage buildings need more comprehensive documentation, such as accurate floor plans, elevations, and other architectural details. They often undergo modifications and renovations over time, making it difficult to determine their original state. The limited data pose a significant challenge for architects, historians, and preservationists who aim to represent and restore these structures accurately [2,3]. Innovative digital solutions, such as virtual and augmented reality and, more specifically, Heritage Building Information Modelling (H-BIM) [4,5,6], are often needed to help overcome this limitation by visualizing the known elements of architectural heritage to see the bigger picture digitally and thus open the opportunities to predict the missing parts by connecting the dots. In this process, it is crucial to evaluate the accuracy of the resultant models and the reliability of the data source [7]. However, accuracy is closely tied to the availability of reliable resources, such as archival documents and previous research, which can enhance the reliability of the outcome [8]. In this domain, AI image generation offers new opportunities to explore traditional architectural styles by blending them with modern elements or reinterpreting them in futuristic and fluid forms. The advancement of AI has the potential to make substantial contributions to this aspect of heritage architecture.

This article centers on the artificial intelligence (AI) tool Midjourney, which leverages text-to-image technology within this frame of reference. The AI-powered Midjourney system facilitates users’ input of textual descriptions and generates corresponding visual representations. This tool boasts a user-friendly interface and operates through a prompt-driven procedure. It generates personalized visual expressions based on the textual prompts provided by users. It accomplishes this task by employing an extensive tagged image repository, considering object names, styles, and environmental conditions to construct the desired image [9,10,11].

Through experimentations with the AI tool Midjourney, this article aims to evaluate Midjourney as an architectural virtual lab and to define some major current limits in its AI-generated representations of Islamic architectural heritage. The research questions include:

To what extent can Midjourney produce images of buildings and sites close to their original forms?
What are this tool’s limits to producing these images, and what are the various intertwined factors that affect these limits?

The corresponding objectives of this article include the following:

Evaluating the ability of Midjourney to produce images of buildings and sites close to their original forms.
Defining the limits of this tool to produce these images and the various intertwined factors that affect these limits.

The significance of this research lies in providing timely evaluation factors of important and wide-spreading tools to architectural heritage practitioners and software developers.

2. Literature Review

2.1. Technical Dimensions

The technical dimension of recent advancements in image processing and computer vision, driven by AI and deep learning, is explored in the context of text-to-image generation. This process generates visually realistic images from textual descriptions, distinguishing itself from image captioning. The significance of text-to-image (T2I) generation lies in its versatile applications, including photo searching, art generation, and industrial design. Generative Adversarial Networks (GANs) play a pivotal role in T2I generation, employing principles from game theory to train a generator and discriminator iteratively. The recurrent convolution GAN (RC-GAN) is introduced as a framework for synthesizing images based on human-written textual descriptions [12]. The GAN architecture comprises a generator and discriminator network engaged in adversarial competition. The training process involves maximizing the objective function from the discriminator’s perspective while minimizing it from the generator’s perspective. GANs aim to generate high-quality synthetic data that closely resemble accurate data, showcasing exceptional performance in various tasks.

Conditional GANs (cGANs) extend the GAN framework by generating data conditioned on given conditions. The training process incorporates conditions on the generator and discriminator networks, allowing the generation of images corresponding to specific inputs. Controllable GANs, such as ControlGAN, separate the classifier from the discriminator, addressing issues in conditional-based methods like ACGAN. Text-to-image synthesis generates images from textual descriptions, with GAN-based models like GAN-INT-CLS utilizing conditional GAN architecture for high-quality image synthesis [13].

Text-to-image AI models, driven by advanced deep neural networks, translate natural language descriptions into images. In addressing the authenticity of assessment challenges, GANs, autoencoders, and variational autoencoders (VAEs) are discussed. Various architectures, including recurrent neural networks (RNNs) and transformer models, are employed in text encoding. Diffusion models, gaining prominence in image generation, excel in denoising, inpainting, and super-resolution tasks. Advancements in machine learning algorithms and hardware, especially GPU usage, contribute to the efficiency of training GANs and diffusion models. The ongoing debate on authorship in AI-generated content is highlighted, emphasizing the need for responsible integration of these models in creative practices [14].

2.2. Recent Approaches

Previous academic literature has started to explore various aspects of the specific text-to-image systems of AI, albeit it still needs to be improved and is in an early stage of formation. These studies cover multiple topics, such as computational thinking principles, evaluation of AI-generated images, prompt engineering for text-to-image generation, creativity in text-to-image art, and perception and communication in human–AI co-creation. Repenning and Grabowski [15] introduced the concept of prompting as computational thinking, providing a framework to help users navigate the complexity of AI tools while emphasizing the social aspect of sharing examples and suggestions within user communities. Göring et al. [16] evaluated the appeal of the realism of AI-generated images, finding variations among different generators and the need for specific models to assess these images. In the context of text-to-image generation, as Ruskov [17] demonstrated, prompt engineering is explored to produce illustrations for famous fairytales, with insights into the challenges generation models face. Oppenlaender [18] examined the nature of human creativity in text-to-image art and prompt engineering, highlighting the importance of online communities and discussing evaluation challenges. On the other hand, Lyu et al. [19] delved into how humans collaborate with AI and perceive the generated results, emphasizing emotional communication and the influence of the artistic background.

Chen et al. [20] have discussed the complexity of evaluating generative architectural design images, necessitating comprehensive metrics beyond conventional methods. Collaboration with senior architects led to eight nuanced evaluation items, highlighting the critical importance of “overall impression” and “consistent architectural style” in design assessment. These authors emphasized the need for future work, including expanding architectural style datasets, enhancing text-to-image guiding words, and exploring the model’s feasibility for generating trimodal models or videos.

In a study by Zhang et al. [21], a comprehensive comparison between designs by Antoni Gaudí and an AI system was conducted using five key metrics: Authenticity, Attractiveness, Creativity, Harmony, and overall Preference. Gaudí’s designs excelled in Authenticity and Harmony, while AI-generated plans show promise in Attractiveness and Creativity. These authors acknowledge the challenges for AI in replicating distinct human design styles and highlight the subjectivity in design evaluations. Their article suggests AI’s role as a tool in architectural design, offering diverse solutions and fostering innovation. However, it recognizes limitations in replicating complex, human-centric elements, supporting the importance of cultivating personal design languages in architectural education.

Given the relatively limited formal literature of academic articles on the particular, newly emerged topic of Midjourney, the following practical examples help clarify the case and set up the scene. In a design study conducted by the UK-based real estate group GetAgent, a series of fifteen visualizations was created using Midjourney, reimagining iconic buildings in various architectural styles and showing how they might have appeared if designed in different eras or by other architects. This application challenged conventional thinking about architecture, encouraging experimentation and inspiring a broader conversation about its role in shaping our environment [22]. Another notable example is the work of the architectural designer Kaveh Najafian, who employed Midjourney to create a visually striking series entitled Flying Versailles. The project reimagined the iconic Versailles palace with maximalist and decorative elements such as feathers and gold facades. By continuously refining textual prompts, Najafian achieved intricate details, showcasing the potential of AI technology to empower designers [23]. In a design workshop dedicated to envisioning the future of habitation on the Capri Island of Italy, designers employed a combination of conventional techniques and Midjourney. They aimed to produce visually captivating postcards depicting a Mediterranean island that would not only allure viewers but also evoke small-scale urbanism, seamlessly blending public spaces and agriculture, all without the need for exhaustive research. The resulting proposals, varying from modest to surreal, effectively showcased AI’s potential in reimaging historic architecture and urban forms [24]. Another case conducted by the architect Rolando Cedeño de la Cruz employed AI to reimagine an ancient architectural typology by creating a modern art center inspired by Mesopotamian Ziggurats. The Midjourney-generated series presented a contemporary interpretation characterized by earth-toned exteriors and light-filled interiors [25].

Islamic architecture is experiencing a significant transformation driven by the increasing integration of computational tools, particularly AI, mirroring broader trends in the architectural landscape. AI presents valuable design possibilities for elements such as arabesque and muqarnas, particularly in image generation. However, applying AI in this context necessitates careful consideration within Islamic architecture’s nuanced historical and cultural framework. While AI excels in reimagining tangible architectural features, it has limitations in capturing intangible heritage aspects, such as rituals and oral traditions. It is imperative to acknowledge that the human touch, cultural sensitivity, and craftsmanship inherent in Islamic architecture are irreplaceable and should not be overshadowed by technological advancements. The intersection of AI and the representation of Islamic architecture, especially in the context of mosque design, remains an underexplored frontier in the quest for a modern portrayal of the spiritual essence. Existing limitations, exemplified by tools like Midjourney, underscore the need for a more comprehensive understanding of Islamic architectural heritage’s geographical and temporal diversity. Studying Islamic architecture through AI-generated images is a pressing necessity, demanding a delicate balance between technological innovation and the preservation of cultural intricacies. In navigating this intersection, it is crucial to recognize AI as an augmenting force for human creativity rather than a substitute, emphasizing the absolute need to harness AI’s potential to complement, rather than overshadow, the richness of Islamic architectural traditions.

Our article stands out from the existing literature by being the first to apply analytical experimentations specifically to the case of Islamic architecture in connection with the AI Midjourney tool. While other studies have explored the capabilities and limitations of AI-generated representations in various important contexts, this research focused on a niche area—Islamic architectural heritage—bringing attention to this domain’s unique challenges and constraints.

Our article contributes to the academic community by shedding light on the current limitations of AI image generators in accurately representing Islamic architectural heritage. This insight is valuable for architects, specialists, and researchers interested in utilizing AI-generated images in Islamic architecture for academic, historical, or creative purposes. The findings can inform future developments in AI systems and prompt improvements to address the identified limitations, fostering a more accurate and reliable representation of Islamic architectural heritage through AI technology.

3. Methods and Materials

To address this gap in defining the limits of utilizing an AI-driven text-to-image generation tool in Islamic architectural heritage, this article leverages prompt engineering techniques through historical sources in this field as inputs in the Midjourney website, which requires a minimum monthly payment. In order to achieve this task, this article employs a range of diverse prompts encompassing various aspects of Islamic architectural heritage. The images generated by AI image generators are then analyzed through direct observation by critical analysis of HI, comparing them to real photos to explore the accuracy limits of the outcomes.

This article is exploratory, employing a mixed experimental and observational methodology defined by the mix of AI and HI methods. The authors used the Midjourney website as a virtual lab, where the variables are the prompts, and the samples are defined by the specific buildings and sites referred to in the prompts. However, it is methodologically observational in that it does not affect the model, the non-editable database on the internet that Midjourney used to create its dataset. Instead, it makes its samples through the prompts. It is worth mentioning that Midjourney does not refer to particular original images to generate its AI images but only provides the outcome of the prompt in the form of a set of four images. The user can pick one to make a second round of iterations. In the case of this research, the second iteration did not provide dramatically different results but minor variations on the picked images. We used the first iteration of images as references and subjects of our study because they are related to the studied prompts that we share in the caption of the images and because they significantly define the direction of any other image iterations.

While experimental and observational studies are usually considered quantitative, as they measure the different quantities of the outcome, this article is better described as qualitative, judging the quality of the product. The pros and cons of these methods are embedded in it. Given the cons of the methods used, the results are indicative rather than deterministic, as they are subjected to opinion evaluation instead of absolute measurements. The prompts will be provided to allow for reproducibility by other scholars. However, prompting with Midjourney does not allow the exact reproduction of outcomes as the results are subject to the time they were inserted in the Midjourney platform and the sophisticated randomness embedded in its algorithm and generation process. Consequently, entering the same prompt, even simultaneously, by two different users might not necessarily provide similar, let alone identical, results. However, in its judgment, this article depended on a pattern of somewhat similar repetitive outcomes to generalize its conclusion. Research in this field referred to reproducibility include [16,26].

However, it is worth mentioning that scholars such as Gibney [27] highlighted concerns among scientists about AI contributing to a reproducibility crisis in research. Machine learning’s potential for uncovering hidden patterns is clear, but there are worries about misleading claims, misuse of generative AI, and biased outcomes. Researchers have emphasized the importance of proper AI training and education, advocating for interdisciplinary collaboration and standardized reporting checklists. While some believe in the maturation of the field and responsible AI use over time, this article emphasized the need for vigilance, training, and cultural shifts to ensure reliable and reproducible scientific research involving AI. Kang et al. [28] also mentioned that AI models may misinterpret prompts, generating themes that reflect specific identities without explicit input, raising concerns about reinforcing biased discourse. A significant ethical problem is the irreproducibility of AI-generated images, as the randomness inherent in the generation process makes it impossible to replicate images with the same content, shapes, styles, or layouts using the same keyword prompts. The lack of reproducibility raises ethical questions about the validation and replication of the results, emphasizing the need for specifically tailored open-sourced models to address these concerns and maintain ethical standards in AI-generated images.

Verifying the outcomes relied on a mixed methodology approach encompassing both quantitative and qualitative aspects, grounded in human comprehension and duly accounting for the distinctive features of the examined samples [29].

The first verification method implements a mixed approach, utilizing the latest version in December 2023 of the Visual Attention Software (VAS) developed by 3M company (3M VAS), https://vas.3m.com/ (accessed on 2 February 2024). This AI software was initially conceptualized for applications in product design, advertisements, signage, and understanding unconscious user attraction, and it has also been used in the domain of architectural heritage [30,31,32,33,34,35]. Through scans, the software produces fixation-point probability maps and fixation-point sequence estimations, providing insights into the processing of subconscious eye movement toward the content within the field of view for 3–5 s. The following are the main components of this analysis:

Heatmap: This visual representation illustrates the likelihood that each segment of an image attracts attention within the initial 3–5 seconds of observation. Features perceived during this brief timeframe carry a heightened potential for capturing the audience’s attention.

Hotspots: This numerical simplification of the heatmap outcomes reveals the content most likely to be observed where each region is assigned a numeric score, predicting the probability of viewers directing their attention within that particular region during the first 3–5 seconds.

Gaze Sequence: This component delineates the four most probable gaze locations arranged by their anticipated viewing sequences within the first 3–5 seconds.

It is worth mentioning that there is no version number for this software. Its scalability is updated automatically without any new version number changes.

The second verification method implements a quantitative approach, implementing a survey to assess the legibility of AI-generated images perceived by a selected population, thereby obtaining objective responses. The authors distributed an online questionnaire using Google Forms in December 2023. It aimed to gauge perceptions of the similarity between AI-generated images and the authentic counterparts’ photographs of specific Islamic heritage monuments and regional styles. It utilized a 5-point scale ranging from “very similar” to “very dissimilar.” Several proceeding studies used a similar scale [36,37].

The population involved 45 respondents, including some architectural academics but mainly senior undergraduate students and young graduate professionals with an Islamic art and architecture background, most of whom were at the University of Sharjah, UAE. This population was selected because (1) they were available to be contacted by the first author who taught them Islamic art and architecture courses for the previous three years, (2) they were familiar with Islamic art and architecture themes, and (3) they were a relatively young generation familiar with AI technology.

The evaluation criteria encompassed the memorial architectural and urban configuration, form, and outlines. The questionnaire comprises the following five inquiries:

Rate the similarity between (A), which shows an AI-generated image of Ka‘ba, and (B), which shows an actual photo.
Rate the similarity between (A), which shows an AI-generated image of the Dome of the Rock (Qubbat al-Sakhra), and (B), which offers an actual photo.
Rate the similarity between (A), which shows an AI-generated image of the spiral minaret of Ibn Tulun Mosque in Cairo, Egypt, and (B), which shows an actual photo.
Rate the similarity between (A), which shows an AI-generated image of the Ibn Tulun Mosque in Cairo, and (B), which offers an actual photo.
The following images show the minarets as crucial architectural elements in different sub-regions around the Islamic world. Please attempt to recognize the regions for each image (from A to D).

Figure 1 demonstrates the research design and the verification methodology of its results.

The research materials tested in the prompts include titles of landmarks of Islamic architectural heritage and short texts taken from key historical sources, but often summarised to fit the required length of prompts in Midjourney [38,39]. These sources provided a comprehensive collection of Islamic architectural styles, features, and historical significance. They were selected because they convey details of various regions of the Islamic world in short excerpts. Some prompts were generated with the help of ChatGPT and altered by the authors several times to reach the best possible results.

4. Results

While Midjourney can represent high-end AI-generated images inspired by the Islamic architectural tradition, it currently needs to present the actual appearance of some of its original structures. The analyses of the limitations in AI-generated Midjourney images of Islamic architectural heritage offered a structured framework to understand where Midjourney struggles and identify further research and improvement areas. As a result of the structured examination, this article found that the specified limits can be categorized into the following four main thematic groups:

Limits of the prompt
- Length
- Language
- Numeracy
- Controllability
Limits of fame
Limits of regionality and historical styles
Limits of architectural and urban elements and details

5. Discussions

5.1. The Limits

5.1.1. Limits of the Prompt

Length

The restricted length of the prompt, limited in practice to approximately 1500 words (6000 characters), can hinder users from providing detailed descriptions of intricate architectural elements, such as ornaments and calligraphy, commonly found in Islamic architectural heritage. While this length might seem enough to describe a building or a site, it is, in practice, limiting for some intricate structures and areas. The shortness of the prompt may not allow for a comprehensive portrayal of these complex features. The results obtained from Midjourney improved by providing more comprehensive details. Consequently, longer prompts contribute to more accurate outcomes.

Bab al-Sala, located in al-Qata‘i’, was a prominent gate in the former Islamic capital of Egypt, preceding Cairo. Historical sources, such as those of al-Maqrizi, suggest the city’s demise in 905 AD at the hands of the Abbasids. Regrettably, the once-thriving town has now vanished from existence. Figure 2 demonstrates an example of two alternatives of architectural description for the main city gate, Bab al-Salah, of al-Qata‘i’ based on the historical sources of al-Maqrizi, one generated with 67 characters (B) and the other with 1862 characters (A), where (A) is more realistic.

Language

The language used in the prompt also presents a categorical limitation. The integration of AI in generating images from natural language descriptions has prompted an investigation into the impact of language diversity on text-to-image generators, showing a significant decline in the performance of these generators across various languages, particularly in less commonly used ones. The need for consistent efficacy across languages is highlighted to ensure broader adoption by non-native English speakers and promote linguistic diversity. Proposed remedies include incorporating diverse language data during training and implementing a natural language processing module with machine translation capabilities. Challenges arise from the lack of human-annotated images for many languages, limiting the powers of generators for minority languages [40,41]. Figure 3 demonstrates AI-generated images with several English prompts and their Arabic counterparts. Those produced by the English language prompts consistently provided better results. This bias is likely because datasets and databases primarily comprise English content, while Arabic content constitutes a relatively smaller percentage. The linguistic bias would significantly affect the accuracy and quality of the outcomes generated for non-English prompts.

Numeracy

The quantitative limitations of the prompt word count extend to a qualitative limitation of the numeric information included in the text. Complex numerical calculations and precise quantitative specifications may be challenging to convey within the restricted character count. This limitation could impede the generation of accurate and contextually appropriate results for prompts requiring numerical input. Text-to-image generators often need help counting and accurately representing quantities in their generated images. This limitation stems from the nature of their training and their need to understand physical objects and their referents. Unlike humans, who have a comprehension of various things in the physical world, text-to-image generators rely solely on visual representations, which are patterns of pixels labeled with specific categories. They lack the rich sensory experiences and contextual understanding that humans possess.

Generators like DALL·E and Midjourney that convert text to images employ statistical sampling methods to create distinct visual representations, capturing data in multiple forms. Despite their uniqueness, these generators differ significantly from traditional image-creation processes by generating images representing various entities rather than a singular one within the image. A prominent challenge these generators face, particularly exemplified by Midjourney, is the difficulty in accurately counting or representing specified quantities. This counting problem is attributed to a fundamental representation issue rather than biases in training data. Unlike humans who understand images through real-world experiences, text-to-image generators lack awareness of the physical referents depicted, leading to inaccuracies in outputs. This limitation stems from computational formalism, where visual representations are assumed to provide sufficient information about the object’s nature, highlighting the need for a deeper understanding of referents beyond visual patterns [42].

The Great Umayyad Mosque, completed in the 8th century during the Umayyad Caliphate, is a significant Islamic architectural marvel that embodies historical and cultural richness. The prompt to produce the photos in Figure 4 included a detailed architectural description of the Umayyad Mosque, with given dimensions for the heights and widths of several architectural elements, but the results were far from reality. Strikingly, the number of minarets in the Midjourney AI-generated images could have been more accurate (one or two), whereas they are three in reality.

Controllability

In the current commercial version of Midjourney, there are limitations in controllability. However, scholars have tried to address this issue and enhance control in text-to-image generators. One such endeavor is presented by Ku et al. [12], who introduced TextControlGAN, a novel GAN-based text-to-image synthesis model. Unlike existing models that utilize the conditional GAN (cGAN) framework, TextControlGAN leverages the ControlGAN-based framework to improve text-conditioning capabilities. The model incorporates an independent regressor and employs Data Augmentation (DA) techniques during training. In artistic text style transfer, the challenge of controlling the stylistic degree of shape deformation persists. Yang et al. [43] addressed this challenge by introducing a text-style transfer network that enables real-time control over the stylistic degree of glyph through an adjustable parameter. Their novel bidirectional shape-matching framework establishes effective glyph-style mapping without paired ground truth. A scale-controllable module empowers a single network to characterize multi-scale shape features of the style image and transfer them to the target text. This method surpasses previous state-of-the-art techniques, generating diverse, controllable, and high-quality stylized text. While AI image generators can produce visually appealing images, they often pose challenges when users seek results close to the original forms. Even with lengthy prompts, the resulting image may fall short of expectations for accurate representation due to limited control over the creative process. The generated output may not align perfectly with the user’s desired visual representation even after several iterations. Figure 5 shows how Midjourney could not visualize the Minaret of Ibn Tulun Mosque in Cairo, Egypt, completed in 879 AD, despite being given a clear architectural description. While adding the word spiral improved the result a bit, it did not help to capture the original form of the spiral minaret of the building.

5.1.2. Limits of Fame

Throughout multiple attempts to visualize various iconic buildings associated with Islamic architecture during a Midjourney visualization process, the outcomes proved successful and precise when representing globally renowned structures like the Ka‘ba in Mecca, today in Saudi Arabia, which is Islam’s holiest site, believed by Muslims to have been built by Prophet Ibrahim and the Dome of the Rock in Jerusalem, completed in the late seventh century AD as a symbol of Islamic architecture and religious significance (Figure 6). However, when attempting to visualize buildings with less fame, such as the Umayyad Mosque, Ibn Tulun Mosque, and Sultan Hassan Mosque, the AI visualization fell short of providing a clear and accurate depiction (Figure 7). Despite these mosques’ significant artistic and architectural value, the AI algorithm encountered difficulties recognizing and accurately portraying their specific architectural elements and style, even with a detailed description in the prompt. It could neither simulate the actual building nor generate a similar manner.

This limitation can be attributed to the restricted presence of these buildings in the database utilized by the Midjourney algorithms. It indicates a crucial requirement for a broader and more diverse range of architectural examples to enhance the AI’s ability to visualize less known yet noteworthy monuments and sites of Islamic architecture. By incorporating a more comprehensive array of regional masterpieces into the training data, the AI model would better understand Islamic architecture’s diverse architectural styles and characteristics, ultimately improving its capacity to generate accurate visual representations of such vital structures.

This limit of fame appears in the top two images in Figure 5 that resemble the Umayyad Mosque of Aleppo, which is less renowned than its counterpart in Damascus. It gained internet popularity after its minaret was bombed in 2012 during the Syrian conflict [44].

5.1.3. Limits of Regionality and Historical Styles

Effectively representing the unique characteristics that differentiate various regions within the Islamic world remains a complex challenge. Islamic architecture encompasses multiple styles, motifs, and architectural elements shaped by historical, geographical, and cultural factors specific to each region. Unfortunately, existing AI image generators encounter difficulties when attempting to incorporate these distinctive regional traits into their generated images.

The minarets, recognized as an architecturally prominent feature within Islamic cities, were employed in this article as an illustrative model to explore regional variations in AI representation (Figure 8). The authors conducted a series of prompt-based experiments to trace the distinctive characteristics of minarets across eight sub-regions within the Islamic world: Egypt, Levantine, North Africa, West Africa, Arabia, Turkey, East Africa, and the Far East.

In most of the examined sub-regions, Midjourney effectively captured the distinctive characteristics of minarets in urban and rural contexts. For instance, the sand and mud tones were identifiable in North, Eastern, and Western Africa (Figure 8A,C,G), while the prevalence of forestry was notable in the Far East (Figure 8B). Regarding architectural styles, the results demonstrated that Midjourney can identify various building materials, architectural forms, and outlines found in the Islamic world. Furthermore, it successfully depicted the hybrid architectural styles observed in the Far East (Figure 8B), which combine Islamic, Asian, and local elements. However, in some cases, the scale is exaggerated (Figure 8E). Based on historical sources, the minarets of large cities of Arabia, such as Hadramout and Sanaa, were known for their high minarets; however, they were not tall to the extent displayed in the abovementioned image. In Figure 8H, the minarets generated share some common traits with the famous pencil-shaped minarets of Turkey. However, the dimensions of some parts could be more accurate compared to real-life examples.

5.1.4. Limits of Architectural and Urban Elements and Details

Midjourney successfully depicted the distinctive features of Islamic architecture through the effective utilization of straightforward and unambiguous geometric shapes, notably exemplified by the representation of the Ka‘ba (a cube), the Dome of the Rock (a dome over octagonal extruded walls), and the Malwiya (a spiral geometry). However, the depiction of more complex examples from later periods of Islamic architecture, such as the Mosque of Ibn Tulun with its central dome building within a courtyard setting, proved to be comparatively less accurate in its portrayal (Figure 7). Figure 9 demonstrates a clear drawback in AI regarding its limited ability to accurately incorporate precise details like ornaments, letters, words, calligraphy, and symbols within the generated images. This fact is an ironic contrast to AI proficiency in understanding and interpreting textual prompts for image generation. While the texts in these images appear very detailed and close to the notion of Arabic calligraphy, they are not readable. Requesting specific text often leads to mismatching of letters or unintelligible combinations.

5.2. Validation

5.2.1. Technical Validation using Visual Attention Analysis

This section analyzes selected examples of Islamic architecture using the visual attention software to compare the human vision of the actual building image and its AI counterpart.

In the case of the Ka‘ba, the heat map of the original photo centered on the ornaments and minarets, while the AI-generated images emphasized the building’s outlines. Gaze sequences also differ, with the initial focusing on the Ka‘ba, whereas the AI counterpart prioritized the minarets initially and then shifted attention to the Ka‘ba (Figure 10).

As for the Dome of the Rock, the original and AI-generated heat maps highlighted the golden dome and the front facade. The highlights were similar, with slight differences. The gaze sequence in the original image accentuated the frontal part of the facade and the lower section of the golden dome, providing a nuanced contrast to the AI-generated version (Figure 11).

In the case of the Minaret of Ibn Tulun Mosque, the original and AI-generated images exhibit a prominent heat map emphasizing intricate details on the front of the minaret. Interestingly, both images prioritize the top of the minaret, but the gaze sequence diverges. The analyses of the actual photo focus vertically, while the analyses of the AI-generated counterpart emphasize the stairs (Figure 12).

Examining the Mosque of Ibn Tulun, the heat map of the original photo highlighted the dome, minaret, and arches, while the AI-generated images centered on the frontal facades. Gaze sequences revealed a distinction: the photo prioritized the dome and minaret, while the AI-generated line revolves around the facade’s openings (Figure 13).

The outcomes of the visual attention analyses reveal a notable success of AI image generators in approximating human visual perception of architecture, aligning with visual attention principles. Heat maps and gaze sequence analyses indicate a discernible understanding of AI-generated images. This alignment with visual attention concepts implies that the AI software was able to identify significant features and patterns capturing human attention in architectural observation. The analyses underscored the effective translation of visual attention principles into image generation, showcasing AI’s potential to emulate human perception in architectural imagery. These findings highlight substantial strides in AI image generation, emphasizing their potential to significantly contribute to comprehending and representing complex visual stimuli, particularly in intricate forms like architecture.

5.2.2. Human Validation through Survey

An online survey was conducted to measure the extent of success of AI-generated images produced by Midjourney of the same architectural examples presented above. The survey results unveiled a range of responses across the assessed images. For the AI-generated image of Ka‘ba compared to the original photo, 11.1% found them very similar, 28.9% rated them as equal, 35.6% remained neutral, and 24.45% deemed them dissimilar, with no respondents indicating them as dissimilar. Similarly, in the case of the Dome of the Rock AI-generated image versus the actual photo, 11.1% perceived them as very similar, 46.7% as similar, 33.3% as neutral, and 8.9% as dissimilar, with no respondents categorizing them as very dissimilar. The AI-generated image of the spiral minaret of Ibn Tulun Mosque in Cairo garnered 0% similar ratings, 8.9% similarity, 22.2% neutrality, 48.9% dissimilarity, and 20% very dissimilarity. For the AI-generated image of the Ibn Tulun Mosque in Cairo compared to the actual photo, 6.7% found them very similar, 15.6% identical, 24.4% neutral, 37.8% dissimilar, and 15.6% very dissimilar. All in all, the cases of the Ka‘ba and the Dome of the Rock were clearly more successful than the case of Ibn Tulun and its minaret (Figure 14).

Lastly, Figure 15 shows that in all cases of the minaret style, the correct answer is prevailing, with a large difference between it and the wrong answer. Generally, the respondents were able to answer the question correctly, with at least 50%, meaning that the success rate of the AI-generated image by Midjourney ranged between 73.9% and 50%. Southeast Asia was the most recognized in the identification of minarets in different sub-regions around the Islamic world, with 73.9%. West Africa saw the lowest recognition, with 50%. These outcomes emphasize the diverse perspectives on visual similarity and regional architectural familiarity among the survey participants. Commonly, Midjourney captured the essence of the minaret as an architectural element with different success rates in the four different sub-regions of Islamic architecture with different regions.

Both Figure 14 and Figure 15 reveal that in all cases of the studied monuments and styles, Midjourney was successful to an extent, depending on the factors of the limitations of each case. For example, in the case of the minaret of Southeast Asia, Midjourney was successful to a great extent, seemingly because of the prominent architectural elements of the Pagoda and probably the greenery in urban background. On the other hand, the less successful case was the minaret of the Ibn Tulun Mosque, most likely because of Midjourney’s limited ability to accurately present the architectural element of the spiral stairs.

6. Conclusions

This article concludes that while Midjourney can represent high-end AI-generated images inspired by the structures and sites of the Islamic tradition, it currently needs development to present the actual appearance of some of its original designs. It defined some primary current intertwined limits in AI-generated representations of Islamic architectural heritage by deploying analytical experimentations with this tool. Table 1 summarizes the limitations of the Midjourney system in representing Islamic architectural heritage. It categorizes factors such as content length, language diversity, numeracy, controllability, fame levels, regional/historical constraints, and specificity of architectural elements. This comprehensive analysis provides valuable insights for enhancing the system’s performance in generating nuanced representations.

In terms of triangulation, both validation cases reveal that Midjourney’s AI-generated images were relatively successful, depending on the case. However, it is worth mentioning that objectivity in the survey method is higher than that in the VAS method because the judgment in the survey was based on human intelligence performed by a relatively higher number of observers (population) in comparison with the judgment of the VAS method performed by a relatively lower number of observers (authors).

6.1. Research Limitation

This research confronts some limitations that should be carefully considered in interpreting the research’s outcomes. Firstly, this research was somewhat confined by its reliance on a limited number of cases and the number of participants in the questionnaire; however, this shall not significantly affect the outcome. Secondly, it was exclusively dependent on Midjourney; different virtual labs or tools within the field may yield different results. This research unfolds in a rapidly evolving discipline, with the swift pace of advancements in AI and virtual reality technologies potentially rendering certain aspects of the research outdated. Moreover, the sparse body of literature on AI-generated representations of Islamic architectural heritage limits the depth of analysis, hindering comprehensive comparisons and potentially influencing the robustness of the findings. In terms of the limitations of the validation process, it relied on human subjectivity and AI objectivity, presenting some limitations. Despite these limitations, it is crucial in this comparative investigation between artificial and human intelligence. Nonetheless, the utilization of this methodology offers innovation, particularly in its application to the domain of Islamic architectural heritage.

6.2. Future Research

This article contributes to future advancements by focusing on specific boundary constraints and guiding efforts to enhance AI algorithms, training models, and datasets. The insights gained can also aid in developing specialized AI-generative systems tailored for Islamic architectural heritage, allowing for more accurate representation and generation of images in this unique domain. Furthermore, the AI generation of ideas is a creative tool that can be used extensively in many domains, such as cultural tourism [48], active teaching and learning in a field like Islamic architectural heritage, and hence, knowing its limitations is essential to set the limit of interactivity and usefulness [49,50]. They could be developed in connection with Experiential Learning Theory in order to achieve a better site analysis in architectural and urban design [51]. Furthermore, it can be used in research about the adaptive reuse of heritage sites [52,53].

Future research could define other major and minor limits in this field, especially as AI techniques and methods evolve and develop. They could examine the effect of several rounds of iterations on the outcome. They can polish the methodology used in this article by focusing on the application of iterative simulation of traditional design to participate in answering part of a question raised by Leach about the differentiation between how “artificial” AI is and how misguided our understanding of human intelligence has been [54]. They can compare AI-generated images to those created by natural human intelligence or produced through both AI and HI, which can contribute to what Cantrel and Zhang called “a third intelligence” [55]. Particularly, they could develop a quantitative approach for measurements, including dealing with complex sensitivity issues, especially since the newer versions of Midjourney (V6 released very recently) claim increased sensitivity to prompts [56].

Another study area would involve conducting cross-cultural analyses to assess the effectiveness of AI-generated representations across diverse cultural and architectural contexts, including non-Islamic architectural styles. By comparing AI algorithm outputs from different regions, researchers can gain insights into how regional factors influence the generation of architectural images. A comparative analysis between the AI-generated pictures of Islamic architecture and Western architecture could provide valuable insights into the performance of Midjourney. This comparison could help identify potential biases in AI’s training data and understand how well AI can handle different architectural styles.

Last but not least, ethical and cultural considerations form another vital aspect of research in this field, necessitating an examination of potential issues such as cultural appropriation or misrepresentation that may arise from AI-generated representations.

Author Contributions

Conceptualization, A.W.S., M.W.F.; methodology, A.W.S., M.W.F., M.W.Y., I.I. and K.A.K.S.; software, A.W.S., M.W.F. and S.B.A.; validation, A.W.S., M.W.F. and S.B.A.; formal analysis, A.W.S., M.W.F., I.I. and K.A.K.S.; investigation, A.W.S., M.W.F., I.I. and K.A.K.S.; resources, A.W.S.; data curation, A.W.S. and M.W.F.; writing—original draft, A.W.S. and M.W.F.; writing—review & editing, A.W.S., M.W.F., M.W.Y., S.B.A., I.I. and K.A.K.S.; visualization, A.W.S. and M.W.F.; supervision, A.W.S.; project administration, A.W.S.; funding acquisition, A.W.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the University of Sharjah through a Seed research project entitled Islamic Art-ificial Intelligence Design Impact of Artificial Intelligence on the Perception and Application of Contemporary Islamic Art and Architecture (submitted 2022, accepted June 2023, grant number 2303071011).

Data Availability Statement

The data presented in this study are available in the article.

Acknowledgments

The authors would like to thank the University of Sharjah for its significant support. Special thanks go to Abdul Wahab Bin Mohammad, College of Engineering, Emad Mushtaha, Department of Architectural Engineering, and Nadia M. Alhasani, College of Fine Arts and Design (CFAD), for their administrative support. Thanks to the 3M Visual Attention Software team for answering for their cooperation.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the study’s design, in the collection, analyses, or interpretation of data, in the writing of the manuscript, or in the decision to publish the results.

References

Sukkar, A.W.; Fareed, M.W.; Yahia, M.W.; Mushtaha, E.; De Giosa, S.L. Artificial Intelligence Islamic Architecture (AIIA): What Is Islamic Architecture in the Age of Artificial Intelligence? Buildings 2024, 14, 781. [Google Scholar] [CrossRef]
Bevilacqua, M.G.; Caroti, G.; Piemonte, A.; Ulivieri, D. Reconstruction of Lost Architectural Volumes by Integration of Photogrammetry from Archive Imagery with 3-D Models of the Status Quo. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2019, 42, 119–125. [Google Scholar] [CrossRef]
Balletti, C.; Dabrowski, M.; Guerra, F.; Vernier, P. Digital Reconstruction of the Lost San Geminiano’s Church in San Marco’s Square, Venice. In Proceedings of the IMEKO TC-4 International Conference on Metrology for Archaeology and Cultural Heritage, Trento, Italy, 22–24 October 2020; pp. 1–5. Available online: https://www.imeko.org/publications/tc4-Archaeo-2020/IMEKO-TC4-MetroArchaeo2020-057.pdf (accessed on 1 January 2024).
Sabri, R.; Abdalla, S.B.; Rashid, M. Towards a Digital Architectural Heritage Knowledge Management Platform: Producing the HBIM Model of Bait al Naboodah in Sharjah, UAE. In Proceedings of the 12th International Conference on Structural Analysis of Historical Constructions, Online Event, 29 September–1 October 2021; Roca, P., Pelà, L., Molins, C., Eds.; SAHC 2021. International Centre for Numerical Methods in Engineering, CIMNE: Barcelona, Spain, 2021; pp. 1641–1650. [Google Scholar] [CrossRef]
Aburamadan, R.; Moustaka, A.; Trillo, C.; Makore, B.C.N.; Udeaja, C.; Gyau Baffour Awuah, K. Heritage Building Information Modelling (HBIM) as a Tool for Heritage Conservation: Observations and Reflections on Data Collection, Management and Use in Research in a Middle Eastern Context. In Culture and Computing: Interactive Cultural Heritage and Arts, HCII 2021; Lecture Notes in Computer Science; Rauterberg, M., Ed.; Springer International Publishing: Cham, Switzerland, 2021; Volume 12794, pp. 3–14. [Google Scholar] [CrossRef]
Abdalla, S.B.; Rashid, M.; Yahia, M.W.; Mushtaha, E.; Opoku, A.; Sukkar, A.; Maksoud, A.; Hamad, R. Comparative Analysis of Building Information Modeling (BIM) Patterns and Trends in the United Arab Emirates (UAE) Compared to Developed Countries. Buildings 2023, 13, 695. [Google Scholar] [CrossRef]
Günay, S. Virtual Reality for Lost Architectural Heritage Visualization Utilizing Limited Data. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2022, XLVI-2/W1-2022, 253–257. [Google Scholar] [CrossRef]
Pietroni, E.; Ferdani, D. Virtual Restoration and Virtual Reconstruction in Cultural Heritage: Terminology, Methodologies, Visual Representation Techniques, and Cognitive Models. Information 2021, 12, 167. [Google Scholar] [CrossRef]
Strobelt, H.; Webson, A.; Sanh, V.; Hoover, B.; Beyer, J.; Pfister, H.; Rush, A.M. Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models. IEEE Trans. Vis. Comput. Graph. 2022, 29, 1146–1156. [Google Scholar] [CrossRef] [PubMed]
Oppenlaender, J.; Linder, R.; Silvennoinen, J. Prompting AI Art: An Investigation into the Creative Skill of Prompt Engineering. arXiv 2023, arXiv:2303.13534. [Google Scholar] [CrossRef]
White, J.; Fu, Q.; Hays, S.; Sandborn, M.; Olea, C.; Gilbert, H.; Elnashar, A.; Spencer-Smith, J.; Schmidt, D.C. A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. arXiv 2023, arXiv:2302.11382v1. [Google Scholar] [CrossRef]
Ramzan, S.; Iqbal, M.M.; Kalsum, T. Text-to-Image Generation Using Deep Learning. Eng. Proc. 2022, 20, 16. [Google Scholar] [CrossRef]
Ku, H.; Lee, M. TextControlGAN: Text-to-Image Synthesis with Controllable Generative Adversarial Networks. Appl. Sci. 2023, 13, 5098. [Google Scholar] [CrossRef]
Abdallah, Y.K.; Estévez, A.T. Biomaterials Research-Driven Design Visualized by AI Text-Prompt-Generated Images. Designs 2023, 7, 48. [Google Scholar] [CrossRef]
Repenning, A.; Grabowski, S. Prompting is Computational Thinking. In Proceedings of the IS-EUD 2023: 9th International Symposium on End-User Development, Cagliari, Italy, 6–8 June 2023; Available online: https://ceur-ws.org/Vol-3408/short-s2-07.pdf (accessed on 1 January 2024).
Göring, S.; Ramachandra Rao, R.R.; Merten, R.; Raake, A. Analysis of Appeal for Realistic AI-generated Photos. IEEE Access 2023, 11, 38999–39012. [Google Scholar] [CrossRef]
Ruskov, M. Grimm in Wonderland: Prompt Engineering with Midjourney to Illustrate Fairytales. arXiv 2023, arXiv:2302.08961v2. [Google Scholar] [CrossRef]
Oppenlaender, J. The Creativity of Text-to-Image Generation. In Proceedings of the 25th International Academic Mindtrek Conference, Tampere, Finland, 16–18 November 2022; pp. 192–202. [Google Scholar] [CrossRef]
Lyu, Y.; Wang, X.; Lin, R.; Wu, J. Communication in Human–AI Co-creation: Perceptual Analysis of Paintings Generated by the Text-to-image System. Appl. Sci. 2022, 12, 11312. [Google Scholar] [CrossRef]
Chen, J.; Shao, Z.; Hu, B. Generating Interior Design from Text: A New Diffusion Model-Based Method for Efficient Creative Design. Buildings 2023, 13, 1861. [Google Scholar] [CrossRef]
Zhang, Z.; Fort, J.M.; Giménez Mateu, L. Exploring the Potential of Artificial Intelligence as a Tool for Architectural Design: A Perception Study Using Gaudí’s Works. Buildings 2023, 13, 1863. [Google Scholar] [CrossRef]
Barandy, K. Alternative Histories: Iconic Architecture Reimagined in Different Styles Using AI. Available online: https://www.designboom.com/architecture/getagent-iconic-architecture-reimagined-ai-buildings-different-architectural-styles-ai-midjourney-04-03-2023/ (accessed on 1 January 2024).
Najafian, K. Maximalist AI Explorations Reimagine the Versailles Palace with Mesmerizing Gold Facades; Mango, Z., Ed.; Designboom: Milan, Italy; New York, NY, USA; Beijing, China; Tokyo, Japan, 2022; Available online: https://www.designboom.com/architecture/maximalist-ai-explorations-versailles-palace-gold-facades-kaveh-najafian-09-15-2022/ (accessed on 1 January 2024).
Betsky, A. The Voyage Continues: Designers Use Midjourney to Reimagine Capri. 2022. Available online: https://www.architectmagazine.com/design/the-voyage-continues-designers-use-midjourney-to-reimagine-capri_o (accessed on 1 January 2024).
Khan, R. Midjourney Reinvents Ancient Ziggurat Pyramid as Modern Cultural Landmarks. 2023. Available online: https://www.designboom.com/architecture/midjourney-ancient-ziggurat-pyramid-temple-modern-arts-venue-rolando-cedeno-de-la-cruz-04-27-2023/ (accessed on 1 January 2024).
Göring, S.; Ramachandra Rao, R.R.; Merten, R.; Raake, A. Appeal and Quality Assessment for AI-generated Images. In Proceedings of the 15th International Conference on Quality of Multimedia Experience (QoMEX), Ghent, Belgium, 20–22 June 2023; pp. 115–118. [Google Scholar] [CrossRef]
Gibney, E. Is AI Fuelling a Reproducibility Crisis in Science? Nature 2022, 608, 250–251. [Google Scholar] [CrossRef]
Kang, Y.; Zhang, Q.; Roth, R. The Ethics of AI-Generated Maps: A Study of DALLE 2 and Implications for Cartography. arXiv 2023, arXiv:2304.10743v3. [Google Scholar] [CrossRef]
Creswell, J.W. Research Design: Qualitative, Quantitative, and Mixed Methods Approaches; Sage: Thousand Oaks, CA, USA, 2014. [Google Scholar]
Shi, Y.; Du, J.; Ragan, E. Review Visual Attention and Spatial Memory in Building Inspection: Toward a Cognition-driven Information System. Adv. Eng. Inform. 2020, 44, 101061. [Google Scholar] [CrossRef]
Villegas, E.; Fonts, E.; Fernández, M.; Fernández-Guinea, S. Visual Attention, and Emotion Analysis Based on Qualitative Assessment and Eye-tracking Metrics—The Perception of a Video Game Trailer. Sensors 2023, 23, 9573. [Google Scholar] [CrossRef]
Salama, A.M.; Salingaros, N.A.; MacLean, L. A Multimodal Appraisal of Zaha Hadid’s Glasgow Riverside Museum—Criticism, Performance Evaluation, and Habitability. Buildings 2023, 13, 173. [Google Scholar] [CrossRef]
Li, N.; Zhang, S.; Xia, L.; Wu, Y. Investigating the Visual Behavior Characteristics of Architectural Heritage Using Eye-Tracking. Buildings 2022, 12, 1058. [Google Scholar] [CrossRef]
Lavdas, A.A.; Salingaros, N.A. Architectural Beauty: Developing a Measurable and Objective Scale. Challenges 2022, 13, 56. [Google Scholar] [CrossRef]
Lavdas, A.A.; Salingaros, N.A.; Sussman, A. Visual Attention Software: A New Tool for Understanding the ‘Subliminal’ Experience of the Built Environment. Appl. Sci. 2021, 11, 6197. [Google Scholar] [CrossRef]
Mushtaha, E.; Abu Dabous, S.; Alsyouf, I.; Ahmed, A.; Raafat Abdraboh, N. The Challenges and Opportunities of Online Learning and Teaching at Engineering and Theoretical Colleges during the Pandemic. Ain Shams Eng. J. 2022, 13, 101770. [Google Scholar] [CrossRef]
Alalouch, C. Cognitive Styles, Gender, and Student Academic Performance in Engineering Education. Educ. Sci. 2021, 11, 502. [Google Scholar] [CrossRef]
Peterson, A. Dictionary of Islamic Architecture; Routledge: London, UK, 1996; pp. 187–190. [Google Scholar]
Bloom, J.; Blair, S. (Eds.) Grove Encyclopedia of Islamic Art and Architecture, 3 Volumes; Oxford University Press: Oxford, UK, 2009; Volume 2. [Google Scholar]
Reviriego, P.; Merino-Gómez, E. Text to Image Generation: Leaving no Language Behind. arXiv 2022, arXiv:2208.09333v2. [Google Scholar] [CrossRef]
Liu, S.; Leng, D.; Yin, Y. Bridge Diffusion Model: Bridge non-English Language-native Text-to-image Diffusion Model with English Communities. arXiv 2023, arXiv:2309.00952v1. [Google Scholar] [CrossRef]
Wasielewski, A. Midjourney Can’t Count: Questions of Representation and Meaning for Text-to-Image Generators. Interdiscip. J. Image Sci. 2023, 37, 71–82. [Google Scholar] [CrossRef]
Yang, S.; Wang, Z.; Wang, Z.; Xu, N.; Liu, J.; Guo, Z. Controllable Artistic Text Style Transfer via Shape-Matching GAN. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 4442–4451. [Google Scholar] [CrossRef]
Alafandi, R.; Rahim, A.A. Umayyad Mosque in Aleppo Yesterday, Today and Tomorrow. Int. J. Arts Sci. 2014, 7, 319–347. Available online: https://www.universitypublications.net/ijas/0705/pdf/H4V574.pdf (accessed on 29 December 2023).
Karim, M.M. Kaaba Mirror. Wikipedia 2007. Available online: https://en.m.wikipedia.org/wiki/File:Kaaba_mirror_edit_jj.jpg (accessed on 29 November 2023).
Franco, S. Dome of the Rock. Unsplash 2019. Available online: https://unsplash.com/photos/blue-and-brown-mosque-ex9KQrN1mj0 (accessed on 29 November 2023).
Tahoon, A. Minaret of Ahmed Ibn Tulun Mosque. Wikipedia 2018. Available online: https://ar.m.wikipedia.org/wiki/%D9%85%D9%84%D9%81:Minaret_of_Ahmed_Ibn_Tulun_Mosque.jpg (accessed on 29 November 2023).
Fareed, M.W.; Amer, M. People-centred Natural Language Processing for Cultural Tourism Market: A Research Agenda. In Proceedings of the 2nd International Satellite Conference on Visual Pattern Extraction and Recognition for Cultural Heritage Understanding; CEUR-WS Workshop: Zadar, Croatia, 2023; p. 3600. [Google Scholar]
Sukkar, A.; Yahia, M.W.; Mushtaha, E.; Maksoud, A.; Abdalla, S.B.; Nasif, O.; Melahifci, O. Applying Active Learning Method to Enhance Teaching Outcomes in Architectural Engineering Courses. Open House Int. 2024, 49, 205–220. [Google Scholar] [CrossRef]
Sukkar, A.; Yahia, M.W.; Mushtaha, E.; Maksoud, A.; Nassif, O.; Melahifci, O. The Effect of Active Teaching on Quality Learning: Students’ Perspective in an Architectural Science Course at the University of Sharjah. In Proceedings of the 2022 Advances in Science and Engineering Technology International Conferences (ASET), IEEE Xplore, Dubai, United Arab Emirates, 21–24 February 2022; pp. 1–6. [Google Scholar] [CrossRef]
Yahia, M.W.; Abdalla, S.B.; Sukkar, A.; Saleem, A.A.; Maksoud, A.M. Towards Better Site Analysis in Architectural and Urban Design: Adapting Experiential Learning Theory in Post-COVID Architectural Teaching Methods. Arch. Des. Res. 2023, 36, 51–65. [Google Scholar] [CrossRef]
Duan, Q.; Qi, L.; Cao, R.; Si, P. Research on Sustainable Reuse of Urban Ruins Based on Artificial Intelligence Technology: A Study of Guangzhou. Sustainability 2022, 14, 14812. [Google Scholar] [CrossRef]
Amro, D.K.; Sukkar, A.; Yahia, M.W.; Abukeshek, M.K. Evaluating the Cultural Sustainability of the Adaptive Reuse of Al-Nabulsi Traditional House into a Cultural Center in Irbid, Jordan. Sustainability 2023, 15, 13198. [Google Scholar] [CrossRef]
Leach, N. Design in the Age of Artificial Intelligence. Landsc. Archit. Front. 2018, 6, 8–19. [Google Scholar] [CrossRef]
Cantrell, B.; Zhang, Z. A Third Intelligence. Landsc. Archit. Front. 2018, 6, 42–51. [Google Scholar] [CrossRef]
Foy, P. Getting Started with Midjourney V6. 2023. Available online: https://www.mlq.ai/getting-started-with-midjourney-v6/ (accessed on 1 January 2024).

Figure 1. Diagrammatic demonstration of the research design methodology and variables.

Figure 2. Two Midjourney AI-generated photos with prompts of architectural description for Bab al-Sala gate: (B) generated with 67 characters and (A) with 1862 characters, where (A) is more realistic. The prompts were produced with the help of ChatGPT.

Figure 3. Midjourney AI-generated photos: using the prompt “examples of Ottoman architecture” (top right) and “examples of Mamluk architecture” (top left) in (A) Arabic and (B) English.

Figure 4. Midjourney AI-generated photos using the prompt “The Umayyad Mosque has three minarets”.

Figure 5. Midjourney AI-generated photos: (A) using the prompt “Ibn Tulun Minaret”, (B) using the prompt “Ibn Tulun Spiral Minaret”.

Figure 6. Midjourney AI-generated photos: (A) using the prompt “Dome of the Rock” and (B) using the prompt “The Kaaba, also known as the Cube, is a stone building located at the center of the Masjid al-Haram in Mecca, Saudi Arabia. It is considered the most sacred site in Islam and is revered as the House of God by Muslims worldwide”. The prompt for (B) was produced with the help of ChatGPT.

Figure 7. Midjourney AI-generated photos using the prompt “Mosque of Ibn Tulun”.

Figure 8. Midjourney AI-generated photos using a combination of prompts describing the minarets within eight sub-regions in the Islamic world: (A) East Africa, (B) Far East, (C) West Africa, (D) Egypt, (E) Arabia, (F) Levant, (G) North Africa, and (H) Turkey. The prompts were taken from [38].

Figure 9. Midjourney AI-generated photos using the prompt: “Arabic calligraphy and ornaments on the external facade of Islamic architecture”.

Figure 10. The VAS analysis of both AI-generated images and original images of Ka‘ba. Rows: 1st: original photos, 2nd: Heatmap, 3rd: Hotspot, 4th: Gase Sequence. (source of the original photo on the top right side: [45]).

Figure 11. The VAS analysis of both AI-generated images and original images of the Dome of the Rock. Rows: 1st: original photos, 2nd: Heatmap, 3rd: Hotspot, 4th: Gase Sequence. (source of the original photo on the top right side: [46]).

Figure 12. The VAS analysis of AI-generated and original images of the Minaret of Ibn Tulun Mosque in Cairo, Egypt. Rows: 1st: original photos, 2nd: Heatmap, 3rd: Hotspot, 4th: Gase Sequence. (source of the original photo on the top right side: [47]).

Figure 13. The VAS analysis of AI-generated and original images of the Mosque of Ibn Tulun. Rows: 1st: original photos, 2nd: Heatmap, 3rd: Hotspot, 4th: Gase Sequence. (Source of the original photo on the top right side: Authors, 2024).

Figure 14. The results of questions 1–4 of the question: rate the similarity between (A) the AI-generated image and (B) the actual photo (source of the original photos on the right side: [45,46,47]).

Figure 15. The results of question 5 of the questionnaire: the following images show the minarets as critical architectural elements in Islamic sub-regions. Please attempt to recognize the regions for each image (A–D).

Table 1. A summary of the limitations of Midjourney in generating representations of Islamic architectural heritage and the factors of these limitations.

Limitations		Factors
Limits of the prompt
Length		Long/medium/short
Language		English, Arabic, etc.
Numeracy		One, two, three, etc.
Controllability		Controllable/less controllable/uncontrollable
Limits of fame		Famous/less famous
Limits of regionality and historical styles		Arabia, Levant, North Africa, Far East, etc. Early Islamic, Ottoman, Mamluk, etc.
Limits of architectural and urban elements and details		Calligraphy, arabesque, ornaments, etc.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sukkar, A.W.; Fareed, M.W.; Yahia, M.W.; Abdalla, S.B.; Ibrahim, I.; Senjab, K.A.K. Analytical Evaluation of Midjourney Architectural Virtual Lab: Defining Major Current Limits in AI-Generated Representations of Islamic Architectural Heritage. Buildings 2024, 14, 786. https://doi.org/10.3390/buildings14030786

AMA Style

Sukkar AW, Fareed MW, Yahia MW, Abdalla SB, Ibrahim I, Senjab KAK. Analytical Evaluation of Midjourney Architectural Virtual Lab: Defining Major Current Limits in AI-Generated Representations of Islamic Architectural Heritage. Buildings. 2024; 14(3):786. https://doi.org/10.3390/buildings14030786

Chicago/Turabian Style

Sukkar, Ahmad W., Mohamed W. Fareed, Moohammed Wasim Yahia, Salem Buhashima Abdalla, Iman Ibrahim, and Khaldoun Abdul Karim Senjab. 2024. "Analytical Evaluation of Midjourney Architectural Virtual Lab: Defining Major Current Limits in AI-Generated Representations of Islamic Architectural Heritage" Buildings 14, no. 3: 786. https://doi.org/10.3390/buildings14030786

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Analytical Evaluation of Midjourney Architectural Virtual Lab: Defining Major Current Limits in AI-Generated Representations of Islamic Architectural Heritage

Abstract

1. Introduction

2. Literature Review

2.1. Technical Dimensions

2.2. Recent Approaches

3. Methods and Materials

4. Results

5. Discussions

5.1. The Limits

5.1.1. Limits of the Prompt

5.1.2. Limits of Fame

5.1.3. Limits of Regionality and Historical Styles

5.1.4. Limits of Architectural and Urban Elements and Details

5.2. Validation

5.2.1. Technical Validation using Visual Attention Analysis

5.2.2. Human Validation through Survey

6. Conclusions

6.1. Research Limitation

6.2. Future Research

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI