Next Article in Journal
Crucial Adoption Factors of Renewable Energy Technology: Seeking Green Future by Promoting Biomethane
Previous Article in Journal
A New Fast Calculating Method for Meshing Stiffness of Faulty Gears Based on Loaded Tooth Contact Analysis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Design of Polymeric Membranes for Air Separation by Combining Machine Learning Tools with Computer Aided Molecular Design

by
Jie-Ying Cheun
1,
Joshua-Yeh-Loong Liew
1,
Qian-Ying Tan
1,
Jia-Wen Chong
1,
Jecksin Ooi
2 and
Nishanth G. Chemmangattuvalappil
1,*
1
Department of Chemical & Environmental Engineering, University of Nottingham Malaysia, Jalan Broga, Semenyih 43500, Malaysia
2
School of Engineering and Physical Sciences, Heriot-Watt University Malaysia, No. 1, Jalan Venna P5/2, Precinct 5, Putrajaya 62200, Malaysia
*
Author to whom correspondence should be addressed.
Processes 2023, 11(7), 2004; https://doi.org/10.3390/pr11072004
Submission received: 18 May 2023 / Revised: 29 June 2023 / Accepted: 30 June 2023 / Published: 4 July 2023
(This article belongs to the Section Chemical Processes and Systems)

Abstract

:
The growing importance of the membrane-based air separation processes results in an increasing demand for suitable polymeric membrane structures. This has spurred the interest in designing polymer structures for O2/N2 separation by employing a systematic approach. In this work, a computer-aided molecular design (CAMD)-based framework was developed to identify promising structures of polymers that can be used for air separation. To incorporate constraints in CAMD, the rough set-based machine learning (RSML) method was implemented to establish predictive models for the physical and transport properties of polymer owing to its interpretability. The deterministic rules generated from RSML would be interpreted scientifically reflecting the structure–property relationship to ensure that the molecules generated were feasible according to a scientific point of view. The most prominent rules selected were then integrated as constraints in CAMD. The relevant properties in this framework comprised of glass transition temperature (Tg), molar volume (Vm), cohesive energy (Ecoh), O2 permeability and O2/N2 selectivity. The solutions from CAMD optimisation were demonstrated in case studies. Results indicated the capability of a novel approach in identifying potential polymeric membrane candidates for air separation application that meet the permeability and selectivity requirements.

1. Introduction

Polymeric membrane separation has been transitioning from a laboratory curiosity to a commercial reality for the separation of common gases, which is gaining popularity over the commercial process accomplished by adsorption, cryogenic distillation and amine absorption. The global gas separation membrane market size has expanded from USD 1.88 billion in 2022 to USD 2.09 billion in 2023 at a compound annual growth rate (CAGR) of 11.3% [1]. The market size is forecasted to further grow to USD 3.01 billion in 2027 with a CAGR of 9.5% [2]. This is because the membrane offers simplicity in operation, lower energy costs, a smaller footprint and viable economics as compared to distillation and adsorption; thus, it is extensively used in petrochemical industries, ammonia plants, natural gas processing units and air separation fields [2].
One of the main focuses for membrane-based application is the air separation process which is of great significance to the chemical industry to produce enriched oxygen and nitrogen. Air separation by membranes makes up approximately USD 155 million of the overall membrane gas separation business [3]. Membrane performance greatly depends on its properties including tensile strength, selectivity and permeability. Therefore, efforts and developments are made to synthesise novel polymers with better separation properties and to understand the optimum properties for a polymeric membrane that functions to separate oxygen and nitrogen in the air. Ideally, commercially viable membranes shall possess superior selectivity and permeability in addition to mechanical stability for long operating life. However, there are still yet feasible membranes with such features to be applied in large-scale industrial air separation processes [4].
The design and screening of suitable polymeric membrane material is a resource-intensive process and depends heavily on the available polymer property databases in the market. Moreover, there are limited complete databases reported on structural properties, membrane permeability and selectivity towards O2/N2. Traditional methods of developing a new polymeric membrane structure consume a vast number of chemical compound assessments prior to the final selection, which increases the possibility of overlooking potential polymer structure. To resolve such challenges, a reverse formulation approach known as computer-aided molecular design (CAMD) was employed in this study. It is an effective tool for the design and screening of molecules by forecasting the molecular structure using a set of chemical, physical and structural properties. The pre-requisite for initiation of CAMD modelling is to develop property predictive models including topological, structural, physiochemical and electronic descriptors which are closely related to polymer molecules. Prediction models were therefore developed using machine learning algorithms to relate polymer properties, represented by topological indices with their molecular structures.

1.1. Rough Set Machine Learning (RSML)

Machine Learning (ML) is a discipline of artificial intelligence (AI) focusing on the use of computational algorithms that are designed to emulate human intelligence through exploring patterns in a series of data for future prediction [5]. Nevertheless, most of the modelling which employs ANN, support vector machine (SVM) and Gaussian process regression (GPR) methods require large datasets. In the case where these modelling methods are used even if lack of large data amount, it will lead to unreliable ML algorithms for designing polymers [6]. Furthermore, both ANN and SVM algorithms are black-box treatments by nature, making the learning of principles behind the models challenging. It is necessary to interpret the influence of a specific property on the prediction approach for a plausible gas permeability prediction [7]. Among the ML techniques, rough set ML (RSML) is reported to have the benefit of interpretability to support decisions based on scientific reasoning [8]. Therefore, RSML introduced by Pawlak in the early 1980s, was chosen to be applied in this study owing to its ability to deal with vagueness, inconsistency and uncertainty [9]. The main advantage of rough set theory is that none of the preliminary information about the data is required such as statistical probability distribution, grade of membership or fuzzy set possibility value.
Rough set theory is proven to be useful in real-life applications and it is applicable for the case when the data are limited. It has compelling applications in multiple fields such as medicine, engineering design, business, pharmacology, decision analysis, banking and others. It works in the form of rules which are induced by learning from training examples [9]. RSML produces a set of decision rules from the specified attributes covering all the training data examples with the derived certainty, strength and coverage. RSML has been implemented to determine secure geological reservoirs for the minimisation of CO2 emissions by analysing data from storage sites. Predictive models generated from RSML showed similar results with site selection rules which were established according to proficient knowledge [10]. One of the recent works employed RSML for the prediction of energy consumption within a building, where it was used to eliminate redundant influencing factors prior to the identification of crucial component contributing to the energy consumption [11]. RSML was also used to develop models to predict the fragrance of molecules based on their chemical structure [12]. In another contribution, RSML has been successfully used to identify the optimal operating conditions to produce bio-oil from biomass through fast pyrolysis [13].
Generally, RSML is useful to establish the concealed order in a dataset for the generation of decision algorithms, classification of data and discovering cause–effect relationships of the attributes and decision rules [9]. The concept of indiscernibility relation, which is correlated with a set of attributes, applies to RSML. The variables will be classified into conditional attributes (inputs) and decision attributes or objects (outputs). Suppose U is a finite set of objects, also called the universe, and A is a finite set of attributes. For each attribute a A, the value set is associated with Va [14]. Every attribute a establishes a function as shown in Equation (1). With each subset of attributes B of A, the indiscernibility relation on U is defined and denoted as I(B) in Equation (2).
fa: UVa
I ( B ) = { ( x , y ) U × U :   f a x = f a y ,   a   B }
RSML can also be used for approximation concepts in the case of any vague or indefinite concept. It is categorised into lower and upper approximations where the former comprises all objects that surely belong to the concept while the latter contains all objects that are less certain to belong to the concept. Approximation ideas are presented in Equations (3) and (4), where U is the universe, X is a subset or concept of the universe and B is a subset of A.
B X = { x U : B ( x ) X }
B X = { x U : B ( x ) X Ø }
where B X is B-lower approximation and B X is B-upper approximation of X.
For B-boundary region of X- B N B X , referring to the difference between upper and lower approximations, it is defined in Equation (5).
B N B X = B X B ( X )
The accuracy of the approximation is characterised numerically by coefficient α B , in Equation (6).
α B X = | B X | | B X |
Rough membership is another concept in RSML that considers the uncertainty of the elements in the universe, i.e., a vague concept with boundary line situations. Therefore, the uncertainty is coupled with the membership of a set of elements to form a rough membership function as described in Equations (7) and (8). μ x x is the membership function depicted as a conditional probability, also interpreted as the degree of uncertainty to which x belongs to X [14].
μ x B x = |   X   B x | | B x |
μ x B x [ 0,1 ]
In terms of approximation cases, rough membership function is shown in Equations (9)–(11). It is evident from the equations that a strict relation is present between uncertainty and vagueness in RSML. Vagueness is thus associated with sets while uncertainty is associated with the elements of sets.
B X = { x U :   μ x B x = 1 }
B X = { x U :   μ x B x > 0 }
B N B X = { x U :   0 < μ x B x < 1 }
Reduct generation is introduced in RSML, which means a minimal subset of attributes are generated that allows same categorisation of elements in the universe as the entire set of attributes and preserves the indiscernibility relation in the system. Another significant property is the core, which represents the key attribute of all reducts, i.e., it is the intersection of every reduct and will be included in all reducts. None of the core elements can be removed without influencing the attributes classification [14]. Since there might be more than one reduct generated in a single dataset, further evaluation shall be conducted to select the appropriate reducts that fulfil certain requirements.
To further analyse the decisions and rules generated by RSML, certainty (cerx), strength (σx) and coverage (covx) are computed. Certainty, which is equivalent to precision, means the frequency of objects having a decision, D fulfilling the conditions, C in a set of elements. High certainty deduces a more confident chance for a molecule or element to be classed in the right decision class. Strength indicates the percentage of the total number of data classified under certain rules out of the entire dataset. Coverage, which is also known as recall, defines the objects’ frequency having conditions, C in the decision class, D. Both are evaluated as the generalisation power of a rule. The overall accuracy of the models, which is the number of correct predictions over the total number of predictions, was also estimated. These parameters will be implemented in this work to select the most eminent rules quantitatively [15]. Ideally, decision rules with both high certainty and coverage are desired to be utilised as predictive models [15].

1.2. Computer-Aided Molecular Design (CAMD)

CAMD is a reverse engineering methodology that is capable of predicting, estimating and designing molecules that match a predetermined set of target properties [16]. In CAMD, predetermined target properties and feasibility conditions are expressed as constraints in the problem formulation. An optimization model is then used to generate the best possible molecular structure, based on an objective function that seeks to either maximise or minimise some of the desired properties [17]. Usually, most CAMD approaches rely on property prediction models, such as group contribution (GC) methods to predict a molecule’s properties, which are then used to assess how well the molecule meets the desired set of properties [18]. Other property prediction models employed in CAMD involve using topological indices (TIs) as molecular descriptors that rely on quantitative structure–property relationship (QSPR) for property estimation [19].
Numerous systematic frameworks, procedures, and algorithms based on the CAMD approach have been developed extensively in recent years. For instance, Wang and Cheng introduced a CAMD framework to identify a suitable bio-compatible solvent for the extractive fermentation and separation process [20]. This CAMD problem was formulated as a multi-objective optimisation (MOO) problem whereby the goal was to simultaneously optimise several targets, which included maximising the production rate, extraction efficiency, and limiting solvent usage. A MOO approach using CAMD had been used to identify effective solvents to extract palm oil from palm-pressed fibre [21]. The method optimized multiple targets such as performance targets, safety, health and environmental objectives using Fuzzy Analytic Hierarchy Process. A COSMO-CAMD framework was developed [22] to design solvents for liquid–liquid extraction processes of phenol and hydroxymethylfurfural from water. This framework includes quantum mechanical information incorporated into CAMD, which predicts properties more accurately and independently of experimental data.
Aside from the vast application of CAMD in solvent design, it is also widely applied in various chemical product design applications. Yee et al. created a systematic framework for designing personal care products that integrate safety, health, and performance considerations into the CAMD formulation [23]. By placing limitations on safety and health risks during CAMD, they were able to generate molecules that were less toxic while still exhibiting outstanding product performance. CAMD approaches have also been used in the design of polymeric membranes based on group contribution methods [24] and molecular dynamics [25]. In these approaches, a set of desirable properties of the polymer has been targeted and the polymeric structures that meet those properties have been generated. There were a few recent studies in CAMD that focused on the development of fragrance products. For instance, a recent methodology involved the use of a series of MINLP models for screening and designing fragrances in shampoo using CAMD, which eliminated molecules that did not meet specific constraints and fragrance design properties [26]. In another study, an enhanced hyperbox ML approach was integrated with CAMD to generate rules that were used to create fragrance property prediction models [27]. Similarly, a CAMD framework was developed to facilitate the design of fragrance molecules using a rough set-based machine learning (RSML) model to generate constraints for the prediction of odour properties [28]. The hybrid CAMD-ML approach resulted in a diverse array of feasible compounds that met structural and physical property requirements. Both studies demonstrated the effectiveness of CAMD in generating potential fragrant molecules for consumer products. For more comprehensive information on the latest developments in this area, readers can refer to review articles by Chemmangattuvalappil et al. [29] and Zhang et al. [30].
One of the most important prerequisites for developing a polymeric membrane is the identification of structures that possess the desirable attributes needed to be used for this application. Although the CAMD-based polymer design methodology can effectively determine the polymer structures with desirable properties, there was a need to develop predictive modes for various polymer-based products to cater for wider applications [24]. Rough set machine learning (RSML) was employed to develop reliable predictive models for polymer properties, comprising of topological indices in this work. Since there are no existing models relating TIs with the O2/N2 membrane separation characteristics, they are used as structural descriptors for property correlations in predicting polymer properties for air separation. The prominent rules will then be integrated into CAMD as property constraints after screening the rules generated from RSML. Classification models were established for physical properties including glass transition temperature (Tg), molar volume (Vm), cohesive energy (Ecoh) and transport properties such as permeability and selectivity in view of their impacts towards polymeric membrane functionality and O2/N2 separation feasibility. The polymeric membrane design using CAMD was formulated to optimise the high tensile strength of the polymeric membrane molecule structure.

2. Methodology

Figure 1 illustrates the methodology developed to design air separation polymeric membrane molecules by employing RSML and CAMD tools. The entire work is separated into 4 main steps, starting with the identification of significant attributes impacting polymeric membranes, followed by establishment of property prediction models and implementation of CAMD model for the membrane design and lastly design model verification.
  • Step 1: Polymer attributes/properties identification
For air separation applications, the role of polymeric membranes is to ensure effective separation between oxygen and nitrogen. Technical requirements for the polymer attributes were separated into physical and transport properties. The essential physical properties to be fulfilled are the fundamental polymer properties to function at normal operating conditions which are glass transition temperature, molar volume and cohesive energies. Glass transition temperature (Tg) indicates the temperature region where polymer changes from rigid “glassy” state to flexible “rubbery” state which is undesired in this case since change in polymer physical state will affect polymer chain flexibility and separation efficiency. Glassy state membrane generally has higher permeabilities to gases as compared to rubbery polymer with higher permeabilities for organic solvents [31]. On the other hand, molar volume (Vm) is related to the fractional free volume of a polymeric membrane representing the free space and accessible volume in membrane model affecting transport behaviour of small gas molecules, i.e., membrane diffusivity–separation properties [32]. Cohesive energy (Ecoh) is the energy required to break all intermolecular physical links per mole of polymer which also indicates dispersion, polar and hydrogen bonding interactions [33]. For transport properties, permeability and selectivity are important factors in the selection of polymeric membrane structures. Permeability defines the speed at which the gas molecules transport across the membrane whilst selectivity indicates the separation degree of the target gas molecules from other molecules [32].
  • Step 2: Development of property prediction models for estimating properties
RSML was utilised to develop predictive models for physical and transport properties using topological indices. Though there are existing models for the physical properties, it has been reported that the accuracy for Tg and Ecoh models is relatively low [34]. Moreover, complete polymer data available for both the properties and their respective topological indices are not abundant, making RSML suitable to be applied in addition to its interpretability. Furthermore, the classification predictive model for each property is more relevant in this case as it determines the class to which a polymer’s property belongs, thereby determining its suitability for air separation applications.
  • Step 2.1: Database and properties classification
A polymer database was required to develop ML models for data training. Database established by Van Krevelen and Te Nijenhuis [35] and Bicerano [33] containing polymer physical properties information were used in this work. Gas permeability and selectivity information were obtained from Jia and Xu [36] with approximately 60 entries. All the properties are quantitative and were directly input to build up the information system. Each physical and transport property was categorised into 2 classes in view of the data range and availability. Low boundary of Tg was set at 300 K, since from the original dataset, approximately 67% of the Tg falls above 300 K which is the typical temperature region to fulfil normal operating conditions. Based on a similar concept, boundary for Vm was set at 100 cm3/mol whereas Ecoh was 35,000 J/mol. It was desired to have polymer with Tg higher than 300 K, Vm more than 100 cm3/mol and Ecoh lower than 35,000 J/mol as higher cohesive energy increases the chain’s density making it harder to allow molecule permeation [37].
Oxygen separation membranes are attractive if membrane’s selectivity ranges from 4 to 6 while oxygen permeability is more than 10 Barrers [38]. In order to achieve more extensive penetration and obtain higher product purity, higher membrane selectivity and permeability values are desired. Therefore, the lower boundary for membrane permeability was set at 10 Barrers whilst O2/N2 selectivity was set at 4.
  • Step 2.2: Representation of monomer molecules using topological indices
Topological indices (TIs) used in this study were zeroth and first-order connectivity indices including its valency, electro-topological state (E-state) index and shape indices including Kappa Order 1 to 3 (1 κ , 2 κ , 3 κ ) as well as Kappa Alpha Order 1 to 3 (1 κ α , 2 κ α ,  3 κ α )and Kappa Flexibility Index. They were utilised to represent numerically the monomer’s physical properties in terms of structural aspects such as cross-linking, bond types, branching, electronic information, etc. The connectivity indices were incorporated since they provide information about the number of non-hydrogen atoms and bonding type of the polymer molecules that will influence polymer physical properties such as Tg, where more non-hydrogen atoms result in higher Tg.
Kier and Hall’s electro-topological state (E-state) index incorporates both electronic information and molecular topology to describe the chemical structure at atomic level which is useful in this case. It is the sum of intrinsic state of atom and perturbation factor depicting the influence of the remaining atoms in the molecule [39]. On the other hand, Kappa shape indices characterise molecular structure quantitatively and take into account spatial density. Kappa shape index also encodes information about size, degree of cyclicity and degree of separation in branching [40]. By incorporating the shape indices, it provides insights into polymer cross-linking, degree of branching and molecule sizes, which will affect the polymer permeability and selectivity. First order Kappa shape index, 1 κ , is defined by one bond fragment counts, 1P where only linear graph is considered. Second order shape index, 2 κ , is described by two-bond paths, 2Pi, with two shape extremes of 2Pmax and 2Pmin [41]. Likewise, count of paths of three adjacent bonds, 3Pi, configures the basis for the quantification of third order shape index, 3 κ . Inclusion of alpha in shape indices demonstrates the influence of covalent radius on molecular shape, giving more information about the polymer structure. Lastly, incorporation of flexibility index is based on the role of molecular size, cycles, branching and heteroatom content. The flexibility index is made equivalent to the product of first and second Kappa order, normalised to the number of atoms in the graph [41].
All the topological indices information was extracted from Toxicity Estimation Software Tool (TEST). The software estimates the toxicity as well as topological indices of chemical structure using QSAR methods. Moreover, the tool eases information collection where the toxicity values and other relevant molecular information will be presented once user inputs the respective molecular structure [42].
  • Step 2.3: Construction of predictive models using RSML
The identified physical properties and tabulated data for the respective properties accomplished in the previous steps were defined as the decision attributes. For topological indices which are the structural descriptors, they were determined as the condition attributes. The polymers were selected from the available database with completed TIs information, for instance, 194 polymers were utilised for generating Tg predictive model. A total of 70% of the entire dataset were used for training, while 15% was used for validation and testing, respectively. Out of the 194 polymers, 65 of them have Tg less than 300 K which were classified as Class 1 while the remaining 129 polymers were equal and more than 300 K, classified as Class 2. However, the completed dataset collected for permeability and selectivity was much lesser with only 62 and 53 data, respectively. Due to the limited data availability for permeability and selectivity, 70% of the data were used for training and the remaining 30% were for validation. There was insufficient data available for testing permeability and selectivity; therefore, only validation was performed. As mentioned above, polymers with permeability below 10 Barrers were classed as Class 1 and those above 10 Barrers were classed as Class 2. A similar approach was used for selectivity categorisation, with O2/N2 less than 4 classified as Class 1, and equal or more than 4 classified as Class 2. Likewise, the other physical properties—Vm and Ecoh were also classified into 2 classes with the boundaries as stated in Step 2.1.
All the conditional attributes are composed of topological indices which are continuous attributes whereas decision attributes consisted of all the physical properties that have been classified into classes; hence, decision attributes are integer attributes. There were 5 information tables constructed since 5 physical properties predictive models were aimed to be developed. Table 1 tabulates the layout of a simplified information table where C1, C2 and C3 are continuous attributes.
Once the information system was input with complete data, reduction of attributes was executed to remove redundant attributes. As shown in Table 1, U = { P 1 , P 2 , P 5 } is the finite non-empty set whilst R = { C 1 , C 2 , C 3 } represents attribute set. Indiscernibility (I) indicates the polymers that have the same conditional attribute sets. Indiscernibility for the simplified table of complete relation R, C1&C2, C1&C3 and C2&C3 are shown in Equations (12)–(15). From Equations (13) and (15), removal of either C1, C2 or C3 attributes from relation R shows no effect on the table where it still results in the same classification as original information table. Therefore, C1, C2 and C3 each are indispensable in this case.
R = P 1 } , { P 2 , P 3 } , { P 4 , { P 5 }
I R C 3 = I ( R )
I R C 2 = I R
I R C 1 = I ( R )
In this context, classification generated by all the 3 conditional attributes C1, C2 and C3 is identical to the classification of C1&C2, C1&C3 and C2&C3. In order to determine the reducts of R, pairs of attributes C1&C2, C1&C3 and C2&C3 are to be checked if they are independent. Since I ( C 1 & C 2 ) I ( C 1 ) and I C 1 & C 2 I ( C 2 ) , pairs of C1&C2 are independent. Likewise, C1&C3 and C2&C3 are also determined to be independent. Therefore, the reducts of R are found to be {C1}, {C2} and {C3}. The core is not present in this example since there is no attribute which is the intersection of all reduct sets. Moreover, there are no superfluous attributes which can be omitted in this example since each attribute is a standalone reduct, i.e., each reduct is capable of determining the classification of the system. Subsequently, each reduct was applied to derive a set of rules. For example, attributes C2 and C3 were omitted during rules generation for reduct {C1}. Similar approach was implemented for the other reducts. Three rules were generated in this example:
  • Rule 1: (C1 < 6.8) → (D1 = 1)
  • Rule 2: (C2 ≥ 6.85) → (D1 = 2)
  • Rule 3: (C2 ≥ 4.15) → (D1 = 2)
In this work, there were 12 conditional attributes in the completed information table and 2 classes in each decision attribute to distinguish between normal or robust and unusual or less desired physical properties. A section of the permeability information table is attached in Appendix ATable A1. Similar steps were employed to generate decision rules by deriving reducts from training data. Reducts and rules were generated using the software Rough Set Data Explorer-ROSE2 [43]. Rules generated were then validated using validation dataset by evaluating certainty (cerx), strength (σx) and coverage (covx). These parameters are defined in Equations (16)–(18), by letting S = (U,C,D). The final rules selected were after evaluation of testing dataset. The testing data were sourced from other references to avoid bias issues. Generally, the benchmark for coverage and certainty was set to be higher than 70% to select high coverage and certainty rules. Finalised rules were to be used as property constraints in CAMD. Results of ROSE2 and variation of rules selection are discussed in Section 3.1.2.
σ x   C , D = s u p p x ( C , D ) c a r d ( U )
c e r x C , D = c a r d ( C x D x ) c a r d ( C x ) = s u p p x ( C , D ) c a r d ( C x ) = σ x ( C , D ) π ( C x )
c o v x C , D = c a r d ( C x D x ) c a r d ( D x ) = s u p p x ( C , D ) c a r d ( D x ) = σ x ( C , D ) π ( D x )
where π C x = c a r d ( C x ) c a r d ( U ) and π D x = c a r d ( D x ) c a r d ( U ) .
  • Step 3: Design of air separation polymeric membrane molecules using CAMD model
In the optimisation model, the objective function was to maximise tensile strength of the polymer. High tensile strength is essential so that polymeric membrane is able to withstand mechanical stresses during operation and maintain their integrity to allow for appropriate gas flux across the membrane [44]. Property constraints were included in CAMD model which was derived from RSML algorithms decision rules.
  • Step 3.1: Formulation of structural constraints
In order to generate a feasible molecular structure, structural constraints were included in CAMD model so that molecules do not violate basic feasibility criteria such as octet rule. The molecules should not contain any free bonds or have any unattached sites or multiple bonds attached to the same site. First, suitable first-order molecular groups were selected that may potentially form the building blocks for a monomer molecule design. The first-order molecular groups determined were as stated in Appendix BTable A2. Linear structural constraints were developed using integer variables based on the algorithms developed by Churi and Achenie [16]. Let m be the number of structural groups, vk be the valence of the kth group while smax indicates the maximum valence of all the groups in the basis set. Moreover, n is the number of groups in the designed structure and nmax is the maximum number of groups allowed in a molecule. m structural groups having vk with maximal valency, smax, were specified at a reasonable nmax. Lower limit of nmax is 2 as it is the minimum group to form a molecule. The actual number of groups, n, will then be obtained from the mathematical programming model. Therefore, in this study, the parameters were
m = 12
vk = [1 1 1 1 3 3 1 1 3 2 2 2 ]
smax = max {vk} = 3
nmax = 14
The entire structural constraints consist of three binary and discrete variables—u, z and w, defined with indices i, j, k and p. Indices i and p define structural group’s position in a designed molecule. Index k specifies the type of functional group while j implies the site of which ith group is attached to pth group. As shown in Equation (19), uik defines if ith position is occupied by kth group in the molecule and it restricts each position i with only one group k. Octet rule is defined in Equation (20) to ensure the number of bonds connected to a group that corresponds to its valency. z i j p indicates if ith group is attached to pth group via jth site.
i = 1 n m a x k = 1 m u i k   n m a x
p = 1 n m a x j = 1 s m a x z i j p = k = 1 m u i k v k   ; i = 1 n m a x
Equation (21) constrains the ith group to be attached to one of the groups before it, defined by (i − 1). w in the equation is also a binary vector which signifies valence site; thus, the first and second terms are zero since they will be occupied. The presence of first term is emphasised in Equation (23) and (i + 1)th group is only present if ith group is present to assure that only one molecule is formed, which is defined in Equation (24).
p = 1 i 1 j = 1 s m a x z i j p w i   ; i = 2 n m a x
i = 1 n m a x k = 1 m u i k + i = 1 n m a x w i = n m a x
w i = 0
w i   w i + 1     ; i = 1 ( n m a x 1 )
To account for various group valences, Equation (25) is introduced in linear form, stating that for kth kind of ith group, the group should not have any attachments for its sites (vk + 1) to smax which are non-existent. M is a significantly larger number than all other terms in the equation, specified as 50 in this model. Furthermore, Equation (26) denotes the symmetry constraints, for instance, the first group attached to second group is equivalent to the second group connected to first one. Since a group cannot be attached to itself, p is set to start from 2.
j = v k s m a x p = 1 n m a x z i j p p = 1 n m a x z i v k p + M u i k   M   ; i = 1 n m a x ,   k = 1 m
j = 1 s m a x z i j p = j = 1 s m a x z p j i   ; i = 1 n m a x 1 ,   p = i + 1 n m a x
Equation (27) ensures that a group’s site can only be attached at most once to another group. Lastly, for any existence of ith group, (i − 1)th group should also be present, as defined in Equation (28). Structural constraints from Equations (19)–(28) are all linear, forming a convex hull.
p = 1 n m a x z i j p   1   ; i = 1 n m a x ,   j = 1 s m a x
k = 1 m u i k k = 1 m u i 1 , k   0   ; i = 2 n m a x
In addition, the prevention of free bonds number formed in generated molecule is described in Equation (29).
v k i = 1 n m a x u i k 2 n m a x 1 = 0   ; k = 1 m
  • Step 3.2: Modelling of Air Separation Polymeric Membrane Molecule
After formulating all the structural criteria, the objective function (Equation (30)) was encoded, where the predictive model for tensile strength, σ , was extracted from Eslick et al. [45]. The x 1 in the predictive model means first order connectivity index whereas x 1 V is the first order valence connectivity index. CD indicates crosslink density that is computed in Equation (31) where DC is degree of conversion determined empirically, wi is the weight fraction of monomer i, nvi is the number of vinyl groups in monomer i and MWi is the monomer i molecular weight.
σ = 1406.6 7484.5 x 1 + 6611.6 x V 1 + 78,231.7 C D m a x 149,268.6 C D
C D = D C i w i ( n v i 1 ) M W i
The framework was a single objective problem aiming to maximise polymer tensile strength. However, mathematical programming algorithms were to be developed to correlate binary terms in structural constraints with the connectivity indices in the predictive model. The correlation terms were derived for x 1 and x 1 V in Equations (32) and (33), respectively.
x 1 = i = 1 n m a x p = i + 1 n m a x j = 1 s m a x z i j p i j
x V 1 = i = 1 n m a x p = i + 1 n m a x j = 1 s m a x z i j p i V j V
i = i = 1 n m a x u i k δ k   ; k = 1 m
i V = i = 1 n m a x u i k δ k V   ; k = 1 m
Equations (32)–(35) were utilised to determine the connectivity indices of the bond between the attachment of the groups at different positions. Nevertheless, connectivity indices within the first-order group itself were also calculated using similar approach where edges of the groups are known [46], as presented in Equations (36) and (37).
x 1 = i = 1 n m a x u i k δ i δ j   ; k = 1 m
x V 1 = i = 1 n m a x u i k δ i V δ j V   ; k = 1 m
Incorporation of Equations (30)–(37) make the optimisation formulation non-convex because of the trilinear terms. Therefore, it is necessary to linearise some of the equations and make it a convex problem to obtain feasible solutions. In this context, the trilinear terms in Equations (32) and (33) were modified to make the square root denominator terms known values, which will be demonstrated in Section 3.2.2.
  • Step 3.3: Incorporation of physical constraints in CAMD modelling
To ensure a viable molecule could be generated, property constraints determined from RSML were included. Since tensile strength is influenced by connectivity indices as observed in Equation (30), the constraints to be included in CAMD consist of only the selected predictive rules comprising connectivity indices. The remaining selected predictive rules containing other topological indices were cross-checked and verified again after the molecular structure was derived. The property constraints model will be either of upper and/or lower bound number range derived from RSML to ensure the molecule designed is under the desired category. At this stage, the CAMD formulation has been formulated by maximising σ , subjected to structural and property constraints. The CAMD problem was then solved using global solver in LINGO extended version 20.0 after transforming the non-linear terms in Equations (32) and (33) to be convex functions, which will be elaborated in Section 3.2.2.
  • Step 4: Verification
Once a molecule was generated from CAMD, it was first verified whether the molecule exists in the present polymer database. If the designed molecule was present in existing database, this proved the model’s accuracy in discovering potential polymer structure candidates. In the context where the polymer generated was not suitable for air separation purposes, integer cuts constraints would be incorporated to generate different solutions. Otherwise, if it is not available in present database, a literature review could be performed to determine its separation characteristics. If the designed molecule could neither be found in the literature nor existing database, experimental verification should be conducted to validate the molecule’s properties. Solutions from CAMD could be utilised to guide the focus of experimental analysis. However, if the generated molecules were not able to meet the desired properties, RSML shall be revisited and modified to improve the prediction reliability and accuracy.

3. Results and Discussions

The development of polymer properties predictive models is pre-requisite for CAMD formulation; therefore, the determined rules generated from RSML were discussed extensively prior to incorporating them as property constraints into CAMD problem. Approach to generate feasible polymer structure was demonstrated with case studies as well.

3.1. Development of Predictive Models Using RSML

Polymer predictive models from RSML were used as constraints in the generation of feasible polymeric membrane molecules. The generation of cores, reducts and rules as well as selection of most prominent rules were discussed in the following sections.

3.1.1. Cores and Reducts Generation

There were five information systems established for Tg, Vm, Ecoh, O2 permeability and O2/N2 selectivity, respectively. Each of the information systems consisted of 12 conditional attributes whilst the decision attribute was each classified into 2 classes—Class 1 being the less desired property ranges and Class 2 as the more favoured property ranges. There was no core generated from either of the information systems. Nevertheless, the number of reducts generated from Tg, Vm, Ecoh, O2 permeability and O2/N2 selectivity information systems were 19, 20, 20,11 and 10, respectively, where repeated rules were identified in the subsequent analysis.

3.1.2. Rules Generated from Reducts

All the rules generated from all the reducts were evaluated based on their strength, certainty and coverage. In this study, a total of 602 rules were generated from Tg reducts, 305 rules for Vm, 206 rules for Ecoh, 192 rules for permeability and 157 rules generated for selectivity. Table 2 presents the example rules from each decision class extracted from selectivity reduct 1.
Based on Table 2, rule 10 covers 4 data out of the entire training dataset consisting of 37 data, having a strength of 10.81%, which is considered as a feasible rule to be applied in CAMD by constraining Kappa Order 3 and Kappa Alpha Order 2 values according to the values derived from RSML. Rule 10 in this case also depicts that Kappa Order 3 is lesser than 3.83 and the effect of Kappa Alpha Order 2 potentially results in high certainty that the polymer has O2/N2 selectivity of more than 4. All the other rules were interpreted in a similar manner, however, since there were abundant rules generated that fall under the desired properties class, further interpretation and analysis were performed to select reasonable rules.
Generally, the rules generated for Tg, Vm and Ecoh show high coverage and strength as compared to the rules derived for permeability and selectivity. This may be due to the lack of data available for both polymer permeability towards oxygen and O2/N2 selectivity, leading to low coverage and strength. All the rules were analysed using a validation dataset where the filtered rules with their respective strength, coverage and certainty were shown in Appendix CTable A3 and Table A4. These rules were filtered based on a certainty of more than 75% and were selected for the desired properties class. However, it was observed that about 90% of the rules for permeability and selectivity attributes could only fulfil one data in the validation set. On the contrary, there were more physical property rules satisfying certainty of more than 75% as well as having higher average coverage ( 30 % ) than the transport properties’ rules. The accuracy of individual rules is lower because those rules are developed to classify molecules into one of the categories. However, since the certainty is high for all the rules, it can be confirmed that the chosen molecules have the potential to meet the property in the desired range. These rules will be further verified in the testing section and with respect to scientific findings.
Furthermore, it can be noticed that Kappa shape indices including first, second, and third order as well as the incorporation of alpha were present in approximately 65% of the total filtered rules. As a result, Kappa shape indices can be regarded as a significant parameter encoding molecular structure information that could potentially influence the polymeric membrane performance. Nevertheless, it shall be noted on overlapping cases where the exact same polymer might fall in more than one rule under the same decision class. Though a huge number of rules were filtered from the validation dataset, there were still abundant remaining rules to be selected as the finalised constraints in CAMD.

3.1.3. Evaluation of Model Performance and Scientific Coherency of Rules Generated

The physical property rules were tested using a dataset retrieved from different reference sources to gauge the model performance when dealing with entirely new sets of molecules. Through the testing evaluation, rules with a certainty above 80% were further evaluated by analysing any overlapping molecules and the scientific coherency between the conditional and decision attributes. Since the focus is to design desired physical and transport properties of a polymer, only rules falling under the proper category were further analysed which are Class 2 for Tg, Vm, permeability and selectivity attributes whereas Class 1 for Ecoh attribute.
From the analysis, it was found that there were large overlapping polymers between the rules, particularly for physical properties. Therefore, the rules were selected considering the largest coverage as summarised in Table 3. The strength, coverage and certainty were based on rule combinations tabulated from testing data for physical properties, while transport properties were according to validation data due to the lack of a database. A higher x 1 indicates a higher number of vertices in the hydrogen-suppressed graph which means more non-hydrogen atoms that could lead to higher Tg. In addition, Kappa order 3 provides more detailed molecular shape information than the first and second order. A lower Kappa third-order value implies a more spherical and symmetrical molecular structure with more organised polymer chain packing resulting in higher Tg [47].
For Vm, a higher Kappa order such as, in this case, Kappa order 3, tends to have a lower molar volume, thus, the constraint for the third order is lower than the second order as observed in Table 3. This is because branched or networked polymers occupy a smaller space than linear polymers of the same molecular weight [48]. Two rules were combined for Vm to increase the coverage. The higher value of the zeroth order connectivity index indicates a greater degree of connectivity within the polymer; simultaneously, the cohesive energy can be reduced by having a lower E-state index denoting lesser electronic delocalization [49]. Therefore, the polymer has weaker intermolecular interactions resulting in lower cohesive energy. A similar concept applies to the rule consisting of a first-order connectivity index with a narrower limit. The third rule in Ecoh can be explained by having a higher Kappa alpha order 1 value which defines a less ordered structure due to higher branching [50], having lower cohesive energy.
In the case of permeability, 2 rules were combined to improve the coverage. A higher E-state index is considered as having stronger intermolecular interactions with gas molecules that reduce the permeation through the polymer. Moreover, a higher degree of polymer branching reflects by a higher value of Kappa order 3 brings about a more porous and open structure allowing gas molecules to diffuse through [50]. Therefore, it is reasonable from a scientific point of view for the first rule of permeability to be selected. Zeroth order connectivity index was also proven to affect polymer permeability by Bicerano [33]. A more porous structure leads to higher permeability performance but lower selectivity; therefore, Kappa order 3 in the selectivity rule is to be below 3.83 to prevent larger pore size than oxygen molecules. The range of Kappa alpha order 2 might be derived based on the optimum range for oxygen selectivity. The second rule for selectivity involving Kappa flexibility index ensures that the polymer formed is not too flexible nor rigid to selectively allow oxygen but no other gases to permeate through. All the constraints selected in Table 3 demonstrate a trend satisfying the rationale behind scientific reasoning with the numerical value derived from RSML programming based on training dataset pattern.
Referring to Table 3, it is also noted that the E-state index and Kappa shape indices would be widely incorporated in the CAMD modelling. Therefore, the reverse approach was used in the subsequent step to verify whether the molecules generated satisfy these RSML rules since only connectivity indices constraints were included in the optimisation framework.

3.2. Generated Air Separation Polymer Molecules

This sub-section presents the results obtained from solving the optimisation model. Various case studies were conducted to produce a set of solutions. Molecules that fulfil structural, physical and transport property constraints are identified as the potential candidates to be used as air separation membranes.

3.2.1. Non-Convexity in CAMD Modelling

As aforementioned, the typical formulation of this optimisation problem, as demonstrated in Section 2, would yield an MINLP problem. As a consequence of the formation of trilinear terms in Equations (32) and (33), the model became a non-convex problem that will be hard to solve. The non-linearities were contributed by the crosslinking term in the tensile strength equation (Equation (31)) and connectivity indices correlation terms (Equations (32)–(37)). Therefore, the problem was relaxed to form a convex problem through modifications of trilinear equations.

3.2.2. CAMD Model with Linearised Connectivity Index Terms

Since the non-linearities were contributed by the connectivity indices correlation terms, linear formulations were proposed. All the structural constraints from Equations (19)–(29) were still incorporated in this case study while assumptions were made to derive the correlation terms. In the first attempt, only one heteroatom group was included which was CF group with CH3 and CH2 groups, i.e., k = 3 in this case. To reduce the number of integer terms, m was set to be 6. Another assumption made was that with any presence of the CF group, it would be attached to three groups of CH2. Moreover, CH3 groups would only be attached to CH2.
With these connection assumptions, first order connectivity index was formulated as Equation (38) where nCF, nCH3 and nCH2 define the number of CF, CH3 and CH2 groups, respectively. These numbers were defined from Uik terms by specifying k term. The first term in this equation refers to the connectivity index within CF in addition to the bond connections between CF with three CH2. The second term defines 1 x between the CH3 and CH2 group and the final term depicts the connectivity index between CH2 groups only, where nCF and nCH3 are deducted to avoid duplication since two of the CH2 groups from CF are attached to CH3, leaving one CH2 connected to CF which is not connected to CH3. Equation (39) ensures the number of CH2 groups is equal to or more than the other functional groups. The 1 x V applies the same approach, as in Equation (40). Since the edges of groups are known, δ i values will be specified making the denominator terms in the equations to be known values, i.e., δ C F ,   δ C H 2 ,   δ C H 3 are not variables, resulting in linear equations. Hence, Equations (32)–(35) would be replaced with the following linear equations.
x 1 = n C F ( x C F 1 + 3 δ C F δ C H 2 ) + n C H 3 1 δ C H 2 δ C H 3 + ( n C H 2 n C F n C H 3 + 1 ) ( 1 ( δ C H 2 ) 2 )
n C H 2 3 n C F + n C H 3
x V 1 = n C F ( x V 1 C F + 3 δ C F V δ C H 2 V ) + n C H 3 δ C H 2 V δ C H 3 V + ( n C H 2 n C F n C H 3 + 1 ) ( 1 ( δ C H 2 V ) 2 )
Following next, DC in the crosslinked density term was estimated to be 0.7 and wi to be 1. This has now become a convex formulation as the terms in the denominator are known values in this case, instead of variables. Global optimum results were then able to be generated since there were values generated for x 1 and x V 1 to be substituted in the objective function (Equation (30)). For this combination of functional groups, it was determined that heteroatoms were not favoured to maximise polymer tensile strength subjected to the property constraint. The result obtained was a short hydrocarbon structure (butane).
As a result, it is evident that by linearising the correlation terms, the optimisation model becomes convex. This example only considers three first-order groups in the molecules—CF, CH3 and CH2; hence, with different first-order groups and specific group attachment assumptions, the structural constraints formulations from Equations (19)–(28) would need to be modified for each assumption. However, the formulation is linear and can generate reliable results for each class of polymer molecules.

3.3. Verification of Model

From Table 4 and Table 5, the potential candidates were generated according to the structural assumptions made where six out of the seven candidates are available in the existing polymer database [51]. This proves the model’s accuracy in identifying potential polymer structure candidates and the potential of RSML to generate new polymer molecules for effective air separation. The first assumption was to analyse the performance of straight-chain hydrocarbons. Results showed that monomer 1-hexene was the optimum structure fulfilling the permeability rules at the same time satisfying the O2 permeability classification under Class 2. However, none of the selectivity rules were fulfilled which corresponds to its selectivity value that is less than 4. Since Tg and Vm of poly(1-hexene) fall under Class 1, none of the rules in Table 4 are satisfied. On the other hand, its Ecoh is under Class 1, i.e., less than 35,000 J/mol, satisfying Ecoh rule generated from RSML, where poly(1-hexene) x 0   2.5 and E-state index < 13.81. This verifies the robustness and effectiveness of the predictive model from RSML.
Poly(4-methyl-1-pentene) is the isomer of poly(1-hexene) with branching. Despite altering the assumption to have branching, the optimisation model still yielded a six-carbon structure indicating that the six-carbon chain has the maximum tensile strength subjected to the constraints. Furthermore, the tensile strength of the branched chain is expected to be lower than the straight hydrocarbon chain in view of the less ordered chain packing leading to weaker intermolecular forces [52], which is also observed in Table 4. Though its tensile strength is lower, both O2 permeability and selectivity fall within the desired classes, simultaneously fulfilling all the physical properties requirements as well. This makes poly(4-methyl-1-pentene) more attractive than poly(1-hexene) as the candidate for air separation membranes.
Benzene rings were also considered in this study on air separation performance. The aromatic rings positioned in the polymer backbone potentially form a rigid structure with stronger mechanical strength and thermal stability [53]. The hetero group incorporated was the carbonate functional group. CAMD showed polycarbonate as the optimum structure of high tensile strength. However, this result was obtained by relaxing the permeability constraint − x 0 between 4.63 and 5.08 rule, even so, polycarbonate structure still does not meet the other permeability rule where E-state index < 18.25 and Kappa Order 3 4.67. The actual value of polycarbonate permeability towards oxygen is 1.5 barrers (Class 1) which again verifies the effectiveness and accuracy of the RSML predictive model since the permeability of polycarbonate does not fall in Class 2. Although polycarbonate portrays high tensile strength characteristics and meets the desired selectivity, it is not considered a potential candidate due to its low oxygen permeability.
Furthermore, polyphenylene oxide was the optimum structure from CAMD by considering branching and oxides from the benzene ring. Polyphenylene oxide shows relatively high tensile strength which also satisfies the desired permeability and selectivity class. This also illustrates that the predictive rules selected from RSML are fulfilled in this case. Polyphenylene oxide has the highest cohesive energy among the generated molecules, even so, it is still within the acceptable range and does not show any adverse effect towards oxygen molecules permeation. Therefore, polyphenylene oxide emerges to be one of the potential candidates for air separation application. The final generated molecule was polymethyl methacrylate incorporating consideration of the presence of carbonate groups in a straight chain. Though it satisfies the permeability constraint, the molecule does not meet the desired selectivity. Its tensile strength generated from the CAMD model contradicted the trend from the literature which might be due to the inaccuracy of the objective function model extracted from the literature. Nevertheless, based on scientific reasoning, benzene ring structures have stronger mechanical strength as compared to straight chain structures [54].

4. Conclusions

In this study, a computer-aided molecular design (CAMD) framework incorporating rough set-based machine learning (RSML) algorithms for the determination of polymeric structures that has the potential to be considered for air separation has been developed. Topological indices were used to estimate both the physical and transport properties of polymer molecules where the deterministic rules were generated in RSML. The promising rules generated with the highest coverage and certainty were studied qualitatively from scientific standpoints to ensure that they were reliable to be included as property constraints in CAMD modelling. The original non-convex formulation of the CAMD model was transformed into a convex equivalent by transforming the equations into an alternative form. Results demonstrated that the rough set model was able to precisely predict the polymer characteristics of all molecules generated from the optimisation model, proving the reliability of RSML predictive models. After analysing the results, poly(4-methyl-1-pentene) (PMP) and polyphenylene oxide (PPO) emerge to be the most potential candidates for air separation since these two polymers fulfil both oxygen permeability and selectivity requirements as well as the desired physical properties in this study. The results depicted that the proposed methodology in this work could potentially be implemented for the systematic design of air separation polymeric membrane structure. To improve the quality of the models predicted by this method in the future, it is suggested to enhance the robustness and accuracy of the RSML model by incorporating more attributes that could potentially relate to the structure–property relationship. Furthermore, before utilizing a polymeric membrane for air separation applications, it is advisable to conduct economic analysis and feasibility studies to assess aspects such as scale-up feasibility.

Author Contributions

Conceptualization, J.O., N.G.C. and J.-W.C.; methodology, J.-Y.C., J.-Y.-L.L. and Q.-Y.T.; software, J.-W.C.; validation, J.-W.C., J.O. and N.G.C.; formal analysis, J.-Y.C.; investigation, J.O. and N.G.C.; resources, N.G.C. and J.-W.C.; data curation, J.-Y.C. and Q.-Y.T.; writing—original draft preparation, J.-Y.C. and J.-Y.-L.L.; writing—review and editing, N.G.C. and J.O.; visualization, J.-W.C.; supervision, N.G.C. and J.O.; project administration, N.G.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Nomenclature

SymbolDescription
0 x Zeroth Order Connectivity (Chi) Index
0 x V Zeroth Order Valence Connectivity (Chi) Index
1 x First Order Connectivity (Chi) Index
1 x V First Order Valence Connectivity (Chi) Index
δ i   Number of sigma electrons in the hydrogen suppressed graph
δ i V Number of valence electrons
δ i V δ j V Number of edges in the molecules with bond s terminates on vertices i and j
1 κ First Order Kappa Shape Index
2 κ Second Order Kappa Shape Index
3 κ Third Order Kappa Shape Index
1 κ α First Order Kappa Alpha Shape Index
2 κ α Second Order Kappa Alpha Shape Index
3 κ α Third Order Kappa Alpha Shape Index
ΦKappa Flexibility Index
u i k Binary variable that indicates if ith position is occupied by kth group
z i j p Binary variable that indicates if ith group is attached to pth group via jth site

Appendix A. Example of Information Table

Table A1. Part of permeability information table with 12 conditional attributes.
Table A1. Part of permeability information table with 12 conditional attributes.
Decision AttributeCondition Attributes
TagPolymerPO2 (Class) x 0 x V 0 x 1 x V 1 E-States IndexKappa Order 1Kappa Order 2Kappa Order 3Kappa Alpha Order 1Kappa Alpha Order 2Kappa Alpha Order 3Kappa Flexibility Index
1Poly[l-(trimethylsilyl)-1-propyne]25.916.503.066.0013.697.002.346.007.082.406.082.43
2Poly(tert-butylacetylene)25.212.912.561.0615.426.001.635.335.561.344.951.24
3Poly(1-n-heptyl-propyne)27.667.244.914.3126.9510.009.009.149.568.568.718.18
4Poly[o-(trimethylsilyl)phenylacetylene]29.198.895.557.6224.9610.083.812.499.393.352.142.62
5Poly(1-chloro-2-n-butylacetylene)25.545.263.412.8815.617.006.006.006.845.845.845.71
6Poly(1-chloro-2-n-hexylacetylene)26.956.674.413.8821.519.008.008.008.847.847.847.71
7Poly(1-chloro-2-n-octylacetylene)28.368.085.414.8827.2811.0010.0010.0010.849.849.849.70
8Poly[o-(trifluoromethyl)phenylacetylene]29.196.025.553.1818.0710.083.812.498.682.911.812.10
9Poly(1-n-hexyl-2-phenylacetylene)210.068.926.935.4730.9512.078.326.1910.867.215.205.59
10Poly(1-ethyl-2-phenylacetylene)27.236.094.933.4719.428.104.763.116.893.742.292.58
11Poly(1-phenyl-1-propyne)16.535.394.432.9116.537.113.922.385.912.941.621.93
12Poly(1-chloro-2-phenylacetylene)16.535.524.432.9813.977.113.922.386.193.161.792.17
13Poly(oxydimethylsilylene)28.419.144.279.1725.2810.002.945.5310.703.406.203.64
14Hydrogenated Polybutadiene23.412.571.911.159.654.003.004.003.482.484.562.16
15Poly(1,3-butadiene)23.412.571.911.159.654.003.004.003.482.484.562.16
16Polyisoprene (NR)25.003.703.492.3911.675.002.254.004.481.773.481.58
17Polychloroprene13.703.632.392.1213.785.002.254.004.774.773.771.94
18Polystyrene15.404.673.973.0216.676.133.111.805.102.311.211.48

Appendix B. List of First-Order Groups

Table A2. Selected first-order groups.
Table A2. Selected first-order groups.
First-Order Groups
CH3CH=CH2COOHCH=O
CFCClCH2OHC=ONH2
CH3SiCOO-O-CH2

Appendix C. Rules Filtered from Validation Dataset

Table A3. Physical properties rules filtered from validation dataset.
Table A3. Physical properties rules filtered from validation dataset.
Glass Transition Temperature (Tg)
RuleDecisionStrengthCoverageCertainty
x   4.49 0 and Kappa Alpha Order 3 < 2.35Class 224%50%100%
x   4.49 0 and Kappa Order 3 < 2.98Class 224%50%100%
x   5.34 0 and Kappa Alpha Order 2 < 3.43Class 224%50%100%
x 1 2.94 and Kappa Alpha Order 3 < 2.35Class 224%50%100%
x 1 2.94 and Kappa Order 2 < 3.861Class 224%50%100%
x 1 2.94 and Kappa Order 3 < 2.98Class 224%50%100%
E-state Index 10.17 and Kappa Flexibility Index from 1.08 to 2.32Class 234%57%80%
Molar Volume (Vm)
Kappa Alpha Order 2 7.03Class 210.71%13.64%100%
Kappa Alpha Order 2 from 5.27 to 6.4Class 27.14%9.09%100%
Kappa Alpha Order 2 4.89 and Kappa Alpha Order 3 from 3.96 to 6.31Class 27.14%9.09%100%
Kappa Alpha Order 2 from 3.85 to 4.69Class 217.86%18.18%80%
Kappa Alpha Order 2 2.96 and Kappa Alpha Order 3 3.19Class 221.43%27.27%100%
Kappa Alpha Order 2 2.40 and Kappa Alpha Order 3 1.92Class 242.86%50%91.67%
Kappa Alpha Order 3 from 5.16 to 6.31Class 23.57%4.55%100%
Kappa Flexibility Index 6.65Class 210.71%13.64%100%
Kappa Flexibility Index from 3.66 to 4.45Class 217.86%18.18%80%
Kappa Alpha Order 3 from 3.96 to 6.31 and Kappa Flexibility Index 4.66Class 23.57%4.55%100%
Kappa Alpha Order 3 3.19 and Kappa Flexibility Index 2.54Class 27.14%9.09%100%
Kappa Alpha Order 3 from 1.19 to 1.92 and Kappa Flexibility Index 1.67Class 235.71%45.45%100%
x V 0 7.76Class 225%31.82%100%
x V 0 from 5.61 to 6.56Class 228.56%36.36%100%
x V 0 from 5.16 to 5.32Class 23.57%4.55%100%
x V 0 from 5.56 to 5.59Class 27.14%9.09%100%
x 1 3.98 and Kappa Alpha Order 2 < 5.17Class 239.29%45.45%90.9%
x 1 5.68 Class 232.14%36.36%88.89%
x 1 from 3.98 to 5.65 and Kappa Alpha Order 3 3.96Class 27.14%9.09%100%
x     1 3.98 and Kappa Alpha Order 3 < 3.93Class 257.14%63.64%87.5%
x 1 from 3.24 to 3.68 and Kappa Alpha Order 3 3.51Class 23.57%4.55%100%
x 1 from 3.96 to 4.71 and Kappa Alpha Order 3 1.32Class 221.43%22.72%83.33%
x     1 3.98 and Kappa Flexibility Index < 4.85Class 267.86%77.27%89.47%
x    1 from 3.98 to 5.65 and E-state Index 25.58Class 217.86%18.18%80%
x     1 3.98 and E-state Index < 25.42Class 232.14%36.36%88.89%
x     1 3.24 and E-state Index from 13.25 to 15.5Class 23.57%4.55%100%
x   1 from 3.96 to 4.71 and Kappa Alpha Order 1 5.85Class 225%27.27%85.71%
x   1 from 4.76 to 5.65Class 210.71%13.64%100%
x   1 < 3.68 and Kappa Alpha Order 1 6.68Class 23.57%4.55%100%
x V 1 4.09Class 235.71%40.91%90%
x V 1 from 3.05 to 3.54Class 228.57%31.82%87.5%
x V 1 from 3.64 to 4.07Class 27.14%9.09%100%
E-state Index 25.58 and Kappa Alpha Order 2 from 2.96 to 6.4Class 232.14%36.36%88.89%
E-state Index from 14.42 to 25.42 and Kappa Alpha Order 2 from 2.76 to 3.46Class 217.86%18.18%80%
Kappa Alpha Order 2 from 2.4 to 2.5Class 23.57%4.55%100%
E-state Index 36.33Class 214.29%18.18%100%
E-state Index from 20.92 to 30.01 and Kappa Alpha Order 3 3.96Class 210.71%13.64%100%
E-state Index 30.42 and Kappa Alpha Order 3 5.16Class 23.57%4.55%100%
E-state Index from 14.42 to 15.08Class 23.57%4.55%100%
Kappa Alpha Order 3 from 1.32 to 1.92Class 235.71%45.45%100%
E-state Index 27.39 and Kappa Flexibility Index from 2.06 to 6.36Class 225%27.27%85.71%
E-state Index from 20.92 to 27.25 and Kappa Flexibility Index from 1.71 to 3.21Class 27.14%9.09%100%
E-state Index 13.25 and Kappa Flexibility Index from 3.66 to 4.85Class 217.86%18.18%80%
Kappa Flexibility Index from 4.86 to 6.36Class 23.57%4.55%100%
E-state Index from 13.81 to 18.63 and Kappa Flexibility Index from 1.67 to 2.17Class 27.14%9.09%100%
E-state Index from 25.58 to 30.31 and Kappa Order 1   7.06Class 221.43%22.73%83.33%
E-state Index 30.42 and Kappa Order 1   6.06Class 225%31.82%100%
E-state Index from 17.67 to 24.43 and Kappa Order 1   7.06Class 232.14%36.36%88.89%
E-state Index < 15.5 and Kappa Order 1   6.06Class 27.14%9.09%100%
E-state Index < 25.42 and Kappa Order 1   8.05Class 217.86%18.18%80%
E-state Index from 25.58 to 30.31 and Kappa Order 2 4.23Class 217.86%18.18%80%
E-state Index 30.42 and Kappa Order 2   3.22Class 225%31.82%100%
E-state Index from 15.5 to 22.64 and Kappa Order 2 from 3.22 to 4.15Class 221.43%27.27%100%
Kappa Order 2 7.82Class 27.14%9.09%100%
E-state Index 27.39 and Kappa Order 2 from 3.09 to 4.15Class 27.14%9.09%100%
E-state Index 30.42 and Kappa Order 3 3.06Class 217.86%22.72%100%
E-state Index < 15.5 and Kappa Order 3 5.36Class 23.57%4.55%100%
E-state Index from 13.81 to 15.5 and Kappa Order 3 < 3.92Class 23.57%4.55%100%
E-state Index from 25.58 to 30.31 and Kappa Alpha Order 1 8.04Class 217.86%18.18%80%
E-state Index 30.42 and Kappa Alpha Order 1 5.85Class 225%31.82%100%
E-state Index from 18.63 to 24.43 and Kappa Alpha Order 1 6.98Class 217.86%18.18%80%
E-state Index < 18.63 and Kappa Alpha Order 1 from 5.85 to 7.52Class 210.71%13.64%100%
Kappa Order 2 from 5.16 to 6.98 and Kappa Alpha Order 3 3.96Class 27.14%9.09%100%
Kappa Order 2 3.97 and Kappa Alpha Order 3 < 3.19Class 217.86%22.72%100%
Kappa Order 2 from 3.16 to 3.93 and Kappa Alpha Order 3 < 2.3Class 228.57%31.82%87.5%
Kappa Order 2 4.23 and Kappa Alpha Order 3 < 3.93Class 217.86%18.18%80%
Kappa Order 2 7.32 and Kappa Alpha Order 3 6.82Class 210.71%13.64%100%
Kappa Order 2 from 3.09 to 3.16 and Kappa Alpha Order 3 1.32Class 23.57%4.55%100%
Kappa Order 2 <5.72 and Kappa Alpha Order 1   8.04Class 225%27.27%85.71%
Kappa Alpha Order 2 11.36Class 217.86%22.72%100%
Kappa Order 3 3.06 and Kappa Alpha Order 1 from 6.68 to 7.52Class 23.57%4.55%100%
Kappa Order 2 from 3.09 to 3.93 and Kappa Alpha Order 1 5.85Class 232.14%36.36%88.89%
Kappa Order 2 < 6.98 and Kappa Alpha Order 1 9.42Class 210.71%13.64%100%
Kappa Order 3 from 4.49 to 7.09 and Kappa Alpha Order 2 4.89Class 27.14%9.09%100%
Kappa Order 3 < 3.59 and Kappa Alpha Order 2 2.96Class 221.4%27.27%100%
Kappa Alpha Order 2 7.03Class 210.71%13.64%100%
Kappa Order 3 < 4.37 and Kappa Alpha Order 2 4.89Class 23.57%4.55%100%
Kappa Order 3 < 2.6 and Kappa Alpha Order 2 2.4Class 242.86%50%91.67%
Kappa Order 3 < 3.92 and Kappa Alpha Order 2 3.85Class 27.14%9.09%100%
Kappa Order 3 from 1.6 to 2.6 and Kappa Flexibility Index 1.67Class 235.71%45.45%100%
Kappa Order 3 from 5.36 to 7.09Class 27.14%9.09%100%
Kappa Order 3 from 3.06 to 3.59Class 27.14%9.09%100%
Cohesive Energy (Ecoh)
x   2.5 0 and E-state Index < 13.81Class 123.5%57.14%100%
E-state Index < 10.75Class 123.5%57.14%100%
x   0 from 2.5 to 3.78Class 15.88%14.29%100%
Kappa Alpha Order 1 < 2.69Class 111.77%28.57%100%
x   0 < 4.7 and Kappa Alpha Order 2 1.73Class 111.77%28.57%100%
x   0 < 4.7 and Kappa Flexibility Index 1.57Class 111.77%28.57%100%
Kappa Order 3 3.25 and Kappa Flexibility Index from 1.52 to 2.33Class 111.77%28.57%100%
x 1 from 1.4 to 3.59 and E-state Index < 15.08Class 123.53%57.14%100%
x 1 from 1.4 to 2.13Class 111.77%28.57%100%
x 1 from 2.35 to 2.6Class 15.88%14.29%100%
x 1 < 3.59 and Kappa Alpha Order 2 from 1.73 to 2.57Class 117.65%42.86%100%
E-state Index < 13.81 and Kappa Order 1   3.1Class 123.53%57.14%100%
x 1 < 2.6 and Kappa Flexibility Index 1.57Class 111.77%28.57%100%
E-state Index from 11.33 to 13.81Class 111.77%28.57%100%
E-state Index < 13.81 and Kappa Alpha Order 1 2.72Class 123.53%57.14%100%
E-state Index < 13.81 and Kappa Alpha Order 2 1.73Class 111.77%28.57%100%
E-state Index < 20.92 and Kappa Alpha Order 3 3.42Class 111.77%28.57%100%
E-state Index < 13.81 and Kappa Flexibility Index 1.57Class 111.77%28.57%100%
E-state Index < 13.81 and Kappa Flexibility Index < 1.52Class 123.53%57.14%100%
x V 0 from 1.85 to 2.73Class 117.65%42.86%100%
x V 1 < 1.07Class 111.77%28.57%100%
Table A4. Transport properties rules filtered from validation dataset.
Table A4. Transport properties rules filtered from validation dataset.
O2 Permeability
RuleDecisionStrengthCoverageCertainty
x   0 from 4.63 to 5.08Class 25.56%20%100%
x V 0 from 4.55 to 4.62Class 25.56%20%100%
x   1 from 2.71 to 2.78Class 25.56%20%100%
x   1 from 2.92 to 3.12Class 25.56%20%100%
x V 1 from 5.84 to 6.13Class 25.56%20%100%
E-state Index < 18.24 and Kappa Order 3 4.67Class 211.11%40%100%
Kappa Order 3 from 5.43 to 6.28Class 25.56%20%100%
Kappa Alpha Order 1 from 5.52 to 5.75Class 25.56%20%100%
Kappa Alpha Order 2 from 2.94 to 3.02Class 25.56%20%100%
Kappa Alpha Order 3 from 4.87 to 5.51Class 25.56%20%100%
O2/N2 Selectivity
x   0 9.92Class 225%42.86%75%
x   0 from 6.26 to 6.61 and Kappa Alpha Order 2 2.94Class 26.25%14.29%100%
x V 0 9.99Class 26.25%14.29%100%
x   1 6.16Class 225%42.86%75%
x V 1 from 6.08 to 7.58Class 212.5%28.57%100%
E-state Index from 22.09 to 23.82Class 212.5%28.57%100%
E-state Index 31.88Class 212.5%28.57%100%
Kappa Order 1 12.22 Class 212.5%28.57%100%
Kappa Order 2 from 6.12 to 7.84Class 26.25%14.29%100%
Kappa Order 2 from 0.67 to 1.48Class 26.25%14.29%100%
Kappa Order 3 < 2.3 and Kappa Alpha Order 2 2.38Class 26.25%14.29%100%
Kappa Order 3 < 3.83 and Kappa Alpha Order 2 from 0.67 to 1.77Class 26.25%14.29%100%
Kappa Order 3 < 3.83 and Kappa Alpha Order 2 4.03Class 26.25%14.29%100%
Kappa Alpha Order 1 11.34Class 212.5%28.57%100%
Kappa flexibility Index from 3.8 to 5.55Class 26.25%14.29%100%
Kappa flexibility Index from 1.7 to 1.85Class 26.25%14.29%100%
Kappa flexibility Index from 2.72 to 3.32Class 26.25%14.29%100%

References

  1. Gas Separation Membrane Market (2023–2032). The Business Research Company. 2023. Available online: https://www.openpr.com/news/3068812/gas-separation-membrane-market-2023-2032-top-companies (accessed on 7 June 2023).
  2. Lasseuguette, E.; Comesaña-Gándara, B. Polymer Membranes for Gas Separation. Membranes 2022, 12, 207. [Google Scholar] [CrossRef] [PubMed]
  3. Murali, R.S.; Sankarshana, T.; Sridhar, S. Air separation by polymer-based membrane technology. Sep. Purif. Rev. 2013, 42, 130–186. [Google Scholar] [CrossRef]
  4. Chong, K.C.; Lai, S.O.; Thiam, H.S.; Teoh, H.C.; Heng, S.L. Recent progress of oxygen/nitrogen separation using membrane technology. J. Eng. Sci. Technol. 2016, 11, 1016–1030. [Google Scholar]
  5. Bell, J. What Is Machine Learning? In Machine Learning and the City; Wiley: New York, NY, USA, 2022; pp. 207–216. [Google Scholar] [CrossRef]
  6. El-Banbi, A.; Alzahabi, A.; El-Maraghi, A. Artificial Neural Network Models for PVT Properties. PVT Prop. Correl. 2018, 225–247. [Google Scholar] [CrossRef]
  7. Tayyebi, A.; Alshami, A.S.; Yu, X.; Kolodka, E. Can machine learning methods guide gas separation membranes fabrication? J. Membr. Sci. Lett. 2022, 2, 100033. [Google Scholar] [CrossRef]
  8. Pedrycz, W.; Succi, G. Genetic granular classifiers in modeling software quality. J. Syst. Softw. 2005, 76, 277–285. [Google Scholar] [CrossRef]
  9. Pawlak, L.; Grzvmala-Busse, L.; Slowinski, R.; Ziarko, W. Rough Sets. Commun ACM 1995, 38, 88–95. [Google Scholar] [CrossRef] [Green Version]
  10. Aviso, K.B.; Janairo, J.I.B.; Promentilla, M.A.B.; Tan, R.R. Prediction of CO2 storage site integrity with rough set-based machine learning. Clean Technol. Environ. Policy 2019, 21, 1655–1664. [Google Scholar] [CrossRef]
  11. Lei, L.; Chen, W.; Wu, B.; Chen, C.; Liu, W. A building energy consumption prediction model based on rough set theory and deep learning algorithms. Energy Build. 2021, 240, 110886. [Google Scholar] [CrossRef]
  12. Heng, Y.P.; Lee, H.Y.; Chong, J.W.; Tan, R.R.; Aviso, K.B.; Chemmangattuvalappil, N.G. Incorporating Machine Learning in Computer-Aided Molecular Design for Fragrance Molecules. Processes 2022, 10, 1767. [Google Scholar] [CrossRef]
  13. Chong, J.W.; Ng, L.Y.; Aboagwa, O.A.; Thangalazhy-Gopakumar, S.; Muthoosamy, K.; Chemmangattuvalappil, N.G. Computer-Aided Framework for the Design of Optimal Bio-Oil/Solvent Blend with Economic Considerations. Processes 2021, 9, 2159. [Google Scholar] [CrossRef]
  14. Pawlak, Z. Rough set approach to knowledge-based decision support. Eur. J. Oper. Res. 1997, 99, 48–57. [Google Scholar] [CrossRef] [Green Version]
  15. Pawlak, Z. Rough sets, decision algorithms and Bayes’ theorem. Eur. J. Oper. Res. 2002, 136, 181–189. [Google Scholar] [CrossRef] [Green Version]
  16. Churi, N.; Achenie, L.E.K. Novel Mathematical Programming Model for Computer Aided Molecular Design. Ind. Eng. Chem. Res. 1996, 35, 3788–3794. [Google Scholar] [CrossRef]
  17. Zhou, T.; Mcbride, K.; Zhang, X.; Qi, Z.; Sundmacher, K. Integrated solvent and process design exemplified for a Diels–Alder reaction. AIChE J. 2015, 61, 147–158. [Google Scholar] [CrossRef]
  18. Harper, P.M.; Gani, R.; Kolar, P.; Ishikawa, T. Computer-aided molecular design with combined molecular modeling and group contribution. Fluid Phase Equilib. 1999, 158–160, 337–347. [Google Scholar] [CrossRef]
  19. Sun, G.; Fan, T.; Sun, X.; Hao, Y.; Cui, X.; Zhao, L.; Ren, T.; Zhou, Y.; Zhong, R.; Peng, Y. In Silico Prediction of O6-Methylguanine-DNA Methyltransferase Inhibitory Potency of Base Analogs with QSAR and Machine Learning Methods. Molecules 2018, 23, 2892. [Google Scholar] [CrossRef] [Green Version]
  20. Wang, F.; Cheng, H. Computer-aided biocompatible solvent design for an integrated extractive fermentation–separation process. Chem. Eng. J. 2010, 162, 809–820. [Google Scholar]
  21. Ooi, J.; Ng, D.K.S.; Chemmangattuvalappil, N.G. Optimal molecular design towards an environmental friendly solvent recovery process. Comput. Chem. Eng. 2018, 117, 391–409. [Google Scholar] [CrossRef]
  22. Scheffczyk, J.; Fleitmann, L.; Schwarz, A.; Lampe, M.; Bardow, A.; Leonhard, K. COSMO-CAMD: A framework for optimization-based computer-aided molecular design using COSMO-RS. Chem. Eng. Sci. 2017, 159, 84–92. [Google Scholar] [CrossRef]
  23. Yee, Q.Y.; Hassim, M.H.; Chemmangattuvalappil, N.G.; Ten, J.Y.; Raslan, R. Optimization of quality, safety and health aspects in personal care product preservative design. Process Saf. Environ. Prot. 2022, 157, 246–253. [Google Scholar] [CrossRef]
  24. Satyanarayana, K.C.; Abildskov, J.; Gani, R.A. Computer-aided polymer design using group contribution plus property models. Comput. Chem. Eng. 2009, 33, 1004–1013. [Google Scholar] [CrossRef]
  25. Guo, W.; Chai, S.; Zhang, L.; Du, J. Computer-Aided Design of Crosslinked Polymer Membrane Using Machine Learning and Molecular Dynamics. Chem. Ing. Tech. 2022, 95, 447–457. [Google Scholar] [CrossRef]
  26. Zhang, L.; Mao, H.; Liu, L.; Du, J.; Gani, R. A machine learning based computer- aided molecular design/screening methodology for fragrance molecules. Comput. Chem. Eng. 2018, 115, 295–308. [Google Scholar] [CrossRef]
  27. Ooi, Y.J.; Aung, K.N.G.; Chong, J.W.; Tan, R.R.; Aviso, K.B.; Chemmangattuvalappil, N.G. Design of fragrance molecules using computer-aided molecular design with machine learning. Comput. Chem. Eng. 2022, 157, 107585. [Google Scholar] [CrossRef]
  28. Radhakrishnapany, K.T.; Wong, C.Y.; Tan, F.K.; Chong, J.W.; Tan, R.R.; Aviso, K.B.; Janairo, J.I.B.; Chemmangattuvalappil, N.G. Design of fragrant molecules through the incorporation of rough sets into computer-aided molecular design. Mol. Syst. Des. Eng. 2020, 5, 1391–1416. [Google Scholar] [CrossRef]
  29. Chemmangattuvalappil, N.G. Development of solvent design methodologies using computer-aided molecular design tools. Curr. Opin. Chem. Eng. 2020, 27, 51–59. [Google Scholar] [CrossRef]
  30. Zhang, L.; Mao, H.; Liu, Q.; Gani, R. Chemical product design–recent advances and perspectives. Curr. Opin. Chem. Eng. 2020, 27, 22–34. [Google Scholar] [CrossRef]
  31. Harlacher, T.; Wessling, M. Gas–Gas Separation by Membranes. In Progress in Filtration and Separation; Academic Press: Cambridge, MA, USA, 2015; pp. 557–584. [Google Scholar] [CrossRef]
  32. Liu, Y.; Li, N.; Cui, X.; Yan, W.; Su, J.; Jin, L. A Review on the Morphology and Material Properties of the Gas Separation Membrane: Molecular Simulation. Membranes 2022, 12, 1274. [Google Scholar] [CrossRef]
  33. Bicerano, J. Prediction of Polymer Properties; CRC Press: Boca Raton, FL, USA, 2002. [Google Scholar] [CrossRef]
  34. Eichenhofer, M.; Arreguin, S.; Wong, J. Neurogastroenterology and Motility; John Wiley & Sons Ltd.: Hoboken, NJ, USA, 2019; pp. 1–5. [Google Scholar]
  35. Van Krevelen, D.W.; Nijenhuis, K.T. Chapter 1—Polymer Properties. In Properties of Polymers; Elsevier: Amsterdam, The Netherlands, 2009; pp. 3–5. [Google Scholar]
  36. Jia, L.; Xu, J. A simple method for prediction of gas permeability of polymers from their molecular structure. Polym. J. 1991, 23, 417–425. [Google Scholar] [CrossRef] [Green Version]
  37. Rahman, M.M. Membrane Separation of Gaseous Hydrocarbons by Semicrystalline Multiblock Copolymers: Role of Cohesive Energy Density and Crystallites of the Polyether Block. Polymers 2021, 13, 4181. [Google Scholar] [CrossRef] [PubMed]
  38. Koros, W.J.; Mahajan, R. Pushing the limits on possibilities for large scale gas separation: Which strategies? J. Memb. Sci. 2000, 175, 181–196. [Google Scholar] [CrossRef]
  39. Ivanciuc, O. Electrotopological State Indices; John Wiley & Sons Ltd.: Hoboken, NJ, USA, 2007; pp. 85–109. [Google Scholar] [CrossRef]
  40. Hall, L.H.; Kier, L.B. The Molecular Connectivity Chi Indexes and Kappa Shape Indexes in Structure-Property Modeling. Rev. Comput. Chem. 2007, 2, 367–422. [Google Scholar] [CrossRef]
  41. Calibration, M.; Kier, L.B. I21 Index of Molecular Flexibility from Kappa Shape Attributes. Comput. Chem. 1989, 8, 735. [Google Scholar] [CrossRef]
  42. Martin, T. User’s Guide for T. E. S. T. (Toxicity Estimation Software Tool) Version 5.1 A Java Application to Estimate Toxicities and Physical Properties from Molecular Structure; US Environmental Protection Agency: Cincinnati, OH, USA, 2020.
  43. Prędki, B.; Słowiński, R.; Stefanowski, J.; Susmaga, R.; Wilk, S. ROSE—Software implementation of the rough set theory. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 1998; Volume 1424, pp. 605–608. [Google Scholar] [CrossRef]
  44. Kagramanov, G.; Gurkin, V.; Farnosova, E. Physical and Mechanical Properties of Hollow Fiber Membranes and Technological Parameters of the Gas Separation Process. Membranes 2021, 11, 583. [Google Scholar] [CrossRef]
  45. Eslick, J.C.; Ye, Q.; Park, J.; Topp, E.M.; Spencer, P.; Camarda, K.V. A computational molecular design framework for crosslinked polymer networks. Comput. Chem. Eng. 2009, 33, 954–963. [Google Scholar] [CrossRef] [Green Version]
  46. Conte, E.; Martinho, A.; Matos, H.A.; Gani, R. Combined group-contribution and atom connectivity index-based methods for estimation of surface tension and viscosity. Ind. Eng. Chem. Res 2008, 47, 7940–7954. [Google Scholar] [CrossRef]
  47. Cao, C.; Lin, Y. Correlation between the glass transition temperatures and repeating unit structure for high molecular weight polymers. J. Chem. Inf. Comput. Sci. 2003, 43, 643–650. [Google Scholar] [CrossRef]
  48. Fried, J.R. Polymer Science and Technology, 3rd ed.; Pearson: London, UK, 2014. [Google Scholar]
  49. Sulchek, T.A.; Friddle, R.W.; Noy, A. Counting and Breaking Single Bonds: Dynamic Force Spectroscopy in Tethered Single Molecule Systems. In Handbook of Molecular Force Spectroscopy; Springer: Berlin/Heidelberg, Germany, 2008; pp. 251–272. [Google Scholar] [CrossRef] [Green Version]
  50. Stevens, M.P. Polymer Chemistry: An Introduction, 3rd ed.; Oxford University Press: New York, NY, USA, 1999. [Google Scholar]
  51. Mark, J.E. Polymer Data Polymer Data. J. Am. Chem. Soc. 2009, 131, 655–657. [Google Scholar]
  52. AlMaadeed, M.A.; Ouederni, M.; Khanam, P.N. Effect of chain structure on the properties of Glass fibre/polyethylene composites. Mater. Des. 2013, 47, 725–730. [Google Scholar] [CrossRef]
  53. Mohanty, A.D.; Bae, C. Transition Metal-Catalyzed Functionalization of Polyolefins Containing CC, CC, and CH Bonds. In Advances in Organometallic Chemistry; Elsevier: Amsterdam, The Netherlands, 2015; Volume 64, pp. 1–39. [Google Scholar] [CrossRef]
  54. Hearle, J.W.S. Textile Fibers: A Comparative Overview. In Encyclopedia of Materials: Science and Technology; Elsevier: Amsterdam, The Netherlands, 2001; pp. 9100–9116. [Google Scholar] [CrossRef]
Figure 1. Methodology of CAMD development to design air separation polymeric membrane.
Figure 1. Methodology of CAMD development to design air separation polymeric membrane.
Processes 11 02004 g001
Table 1. Simplified polymer information system.
Table 1. Simplified polymer information system.
PolymerConditional AttributesDecision Attribute
C1C2C3D1
P15.96.53.11
P25.2001
P37.77.24.92
P49.28.95.62
P55.55.33.41
Table 2. Example rules generated from selectivity reduct 1.
Table 2. Example rules generated from selectivity reduct 1.
RuleKappa Order 3Kappa Alpha Order 2DecisionStrengthCoverage (Recall)Certainty (Precision)Accuracy
25.432 to 11.545-Class 1 (Selectivity < 4)13.51%29.41%100%83%
10<3.8280.671 to 1.773 Class   2   ( Selectivity 4)10.81%20%100%60%
Table 3. Rules selected for CAMD modelling.
Table 3. Rules selected for CAMD modelling.
RuleDecisionStrengthCoverage (Recall)Certainty (Precision)Accuracy
x 1   2.94 and Kappa Order 3 < 2.98Tg = Class 231%44%89%83%
Kappa   Alpha   Order   2   7.03, or
Kappa Alpha Order 3 from 5.16 to 6.31
Vm = Class 242.3%50%100%85%
x 0   2.5 and E-state Index < 13.81, or
x 1 from 1.404 to 3.59 and E-state Index < 15.08, or
Kappa   Alpha   Order   1   2.72 and E-state Index < 15.08
Ecoh = Class 129.4%100%100%86%
E - state   Index   <   18.25   and   Kappa   Order   3   4.67, or
x 0 from 4.63 to 5.08
Permeability = Class 211.1%40%100%83%
Kappa Order 3 < 3.83 and Kappa Alpha Order 2 from 0.67 to 1.77, or
Kappa Flexibility Index from 2.72 to 3.32
Selectivity = Class 212.5%28.57%100%89%
Table 4. CAMD results.
Table 4. CAMD results.
Polymer NamePoly(1-Hexene)Poly(4-Methyl-1-Pentene)Poly (5-Methyl-Hexene-1)Poly(3-Chlorohexene)
Monomer Molecular StructureProcesses 11 02004 i001Processes 11 02004 i002Processes 11 02004 i003Processes 11 02004 i004
FormulaC6H12C6H12C7H14C6H11Cl
CAS number592-41-6691-37-23524-73-053101-38-5
Structural Assumptions
  • CH = CH2 attach to one CH2
  • CH3 only attach to CH2
  • CH = CH2 attach to one CH2
  • (CH3)2CH only attach to CH2
  • CH = CH2 attach to one CH2
  • (CH3)2CH only attach to CH2
  • CH = CH2 attach to one CHCl
  • (CH3)2CH only attach to CH2
TS [1]3452
O2 Permeability (Barrers)1032.320Not available
O2 Selectivity2.64.2252.5Not available
x 0 4.4064.9925.6985.698
x 1 2.9322.7703.2703.3081
x V 1 2.9322.3792.8793.011
E-state index11.511.83313.33315.4444
1 κ 6677
2 κ 53.24.1674.167
3 κ 5.3335.33363.840
1 κ α 5.7405.7406.7407.026
2 κ α 4.7402.9513.9154.192
3 κ α 5.1055.1055.7403.867
Φ4.5352.8243.7694.208
Tg (K)223302259Not available
Vm (cm3/mol)97.9235139.6Not available
Ecoh (J/mol)13,00026,1607900Not available
Literature TS (MPa)392840Not available
Table 5. CAMD results (continued).
Table 5. CAMD results (continued).
Polymer NamePolycarbonatePolyphenylene OxidePolymethyl Methacrylate
Monomer Molecular StructureProcesses 11 02004 i005Processes 11 02004 i006Processes 11 02004 i007
FormulaC15H16O2C8H8OC5O2H8
CAS number25037-45-025134-01-49011-14-7
Structural Assumptions
  • C6H4O attach to one C
  • C only attach to CH3 and C6H4COO
  • C6H3O attach to CH3
  • C = CH2 attach to one COO and CH3
TS *176
O2 Permeability (Barrers)1.516.820
O2 Selectivity5.7694.4213.71
x 0 3.5774.6905.492
x 1 1.7323.4503.189
x V 1 1.3542.2302.274
E-state index8.66711.09520.833
1 κ 43.9387
2 κ 3.7401.2403.061
3 κ 1.3330.4902.667
1 κ α 1.1053.2186.377
2 κ α 00.8742.533
3 κ α 00.3022.121
Φ1.0330.4022.307
Tg (K)423488378
Vm (cm3/mol)32076.689.3
Ecoh (J/mol)14,40033,30027,700
Literature TS (MPa)62.17550
* Tensile strength is ranked based on CAMD results.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cheun, J.-Y.; Liew, J.-Y.-L.; Tan, Q.-Y.; Chong, J.-W.; Ooi, J.; Chemmangattuvalappil, N.G. Design of Polymeric Membranes for Air Separation by Combining Machine Learning Tools with Computer Aided Molecular Design. Processes 2023, 11, 2004. https://doi.org/10.3390/pr11072004

AMA Style

Cheun J-Y, Liew J-Y-L, Tan Q-Y, Chong J-W, Ooi J, Chemmangattuvalappil NG. Design of Polymeric Membranes for Air Separation by Combining Machine Learning Tools with Computer Aided Molecular Design. Processes. 2023; 11(7):2004. https://doi.org/10.3390/pr11072004

Chicago/Turabian Style

Cheun, Jie-Ying, Joshua-Yeh-Loong Liew, Qian-Ying Tan, Jia-Wen Chong, Jecksin Ooi, and Nishanth G. Chemmangattuvalappil. 2023. "Design of Polymeric Membranes for Air Separation by Combining Machine Learning Tools with Computer Aided Molecular Design" Processes 11, no. 7: 2004. https://doi.org/10.3390/pr11072004

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop