A Clustering Model for Three-Way Asymmetric Proximities: Unveiling Origins and Destinations. (2024)

Link/Page Citation

Author(s): Laura Bocci [1,†]; Donatella Vicari (corresponding author) [2,*,†]

1. Introduction

Three-way proximity data are a structure of relationship data that is collected or measured between pairs of objects under multiple occasions, i.e., sources, settings, times, and experimental conditions. They extend the standard two-way proximity data, where the same objects are the entities in relation, to a three-dimensional framework. The additional dimension is given by the occasion on which the single pairwise proximity (similarity or dissimilarity) was collected. Three-way proximity data are typically arranged as a set of square data matrices (one for each occasion), each containing the proximities between all pairs of objects, and can be either symmetric or asymmetric.

In the latter case, where the relationship between the objects A and B is different from that between B and A, the asymmetry denotes a certain disequilibrium (imbalance) in the pairwise relationship. As a result, not only the presence of such relationships but also their directionality and magnitude, both of which are significant and cannot be ignored, define the true pattern of the data.

This phenomenon occurs in a variety of disciplines, including psychology, sociology, marketing research, social sciences, behavioral sciences, environmental sciences, and beyond, to name a few. Asymmetric proximity data can stem from diverse sources, such as judgments concerning the similarity between stimuli (confusion data), interactions among social actors (social networks), patterns of mobility, transitions between employment sectors, brand switching behaviors, or commercial transactions and trades between countries.

When dealing with asymmetric data, it may be of interest to unveil common behaviors of exchange in order to identify groups of objects that are primarily either origins or destinations of the exchanges, and in the presence of multiple occasions, such common patterns can also be analyzed and compared across them. For instance, the international student mobility between countries over several years yields a three-way asymmetric proximity array, where, in each annual matrix, the rows and columns correspond to the origins and destinations of the mobile students, respectively. The identification of possible pathways of student mobility over time can be an important source of information for policy makers.

Specially designed multiway models and methods are required to analyze such data. Within this framework, clustering three-way data is a complex and challenging task since asymmetric proximity matrices may subsume different classifications of the objects due to the heterogeneity across occasions, and the asymmetry may contain some information relevant to clustering efforts.

When analyzing asymmetric data, the asymmetry is often ignored by symmetrizing the proximities by averaging the two different values for each pair of objects. Nevertheless, asymmetry can be substantial and prominent, and deserves attention because it contains important information about the real patterns in the data, in terms of both the direction and the magnitude of the relationships.

A review of the literature on cluster analysis reveals a notable disparity: methods for asymmetric proximity data have received less attention compared with the large number of models and methods that have been proposed for symmetric proximity data, and even less for three-way data.

Clustering methods for asymmetric data have been developed for a single data matrix mainly following two approaches (see [1,2,3] for extended reviews of clustering methods for asymmetric data).

In the first approach, two different classification structures are estimated, one for the rows and one for the columns of the asymmetric matrix, since rows and columns are assumed to refer to two different sets of objects. Within this approach, the majority of the proposed clustering algorithms generally estimate hierarchical trees and identify non-overlapping clusters, i.e., each object belongs to only one cluster [4,5,6,7]. A non-hierarchical clustering algorithm, called GENNCLUS, that provides either overlapping or non-overlapping clusters for asymmetric or symmetric data has been proposed as a generalization of ADCLUS [8].

In the second approach, a single classification structure is estimated from both rows and columns of the asymmetric matrix because, in accordance with their true nature, they refer to the same set of objects. Within this approach, most of the methodologies extend the classical aggregative hierarchical methods [9,10,11,12], while few proposals concern non-hierarchical methods. Some of the latter are basically extensions of the k-means clustering [13,14,15]. In the same vein but with a different modeling, more recent proposals [16,17] concern non-hierarchical clustering methods that simultaneously fit both the symmetric and the skew-symmetric parts of the data resulting from the algebraic decomposition of the asymmetric matrix (see Section 1.1 for the definition). Specific clustering models for skew-symmetric data have been proposed by Vicari [18] and Vicari and Di Nuzzo [19]. In the first model, possible external covariates for the objects have been included, while the second proposal relies on the between-cluster effects modeled by Singular Value Decompositions that exploit the peculiar properties of the skew-symmetric matrices.

Regarding three-way asymmetric proximities, several models have been proposed within the framework of Multidimensional Scaling (MDS) (see [3] for a review of MDS methods for multiway asymmetric data), but to our knowledge, only one clustering model has been proposed in order to analyze three-way asymmetric proximity data. The proposal of Chaturvedi and Carroll [20] generalizes the INDCLUS model [21] to the asymmetric case by identifying two different sets of (overlapping) clusters of the objects (for the rows and the columns of the data matrices, respectively) common to all occasions, while the three-way heterogeneity is accounted for by occasion-specific weights for the clusters.

We can observe that since Generalized INDCLUS [20] aims at finding two different sets of (possibly overlapping) clusters of objects, the identification of origins and destinations of exchange patterns is rather complicated from an interpretative point of view. This is because the clusters are only partially matched, which often leads to non-parsimonious solutions. Conversely, in many scenarios, when dealing with this type of data, the goal is to analyze the exchange between the same groups of objects that serve as either origins or destinations, because they contain objects with common exchange behavior toward the other groups across occasions, both in magnitude and in direction.

Accordingly, in the lack of proposals for clustering three-way asymmetric proximities, our research objectives are (1) to identify the same groups of objects (for the rows and columns of the data matrices) that are primarily origins or destinations of the exchanges and, together, (2) to measure the extent to which these clusters differ across occasions.

To fill this research gap, we start from the clustering model proposed by Vicari [17], which addresses the first issue and accounts for between-cluster effects when only a single dissimilarity matrix is available (two-way case). In the present paper, we generalize the model [17] to the three-way case with the aim of also accounting for heterogeneity across occasions.

The model is based on the decomposition of each asymmetric matrix into the sum of its symmetric and skew-symmetric components, which are modeled jointly. The asymmetric dissimilarities are assumed to subsume two clustering structures common to all occasions: the first defines a standard partitioning of all objects that fits the symmetric component of the exchanges; the second one, which fits the imbalances, defines an incomplete partition of objects, some of which are allowed to remain unallocated to any cluster. In both clustering structures, objects within the same cluster share the same behavior with respect to exchanges that are directed to objects in different clusters so that “origin” and “destination” clusters are identified. Objects possibly unassigned to any cluster of the incomplete partition represent “nearly” symmetric objects, characterized by small imbalances. As a novel contribution, this paper accounts for heterogeneity across occasions by estimating occasion-specific sets of weights that can capture both the average magnitude and the direction of exchange between clusters. This makes it possible to analyze the role of the common clusters as either origin or destination, which may differ across occasions.

This paper is organized as follows: After an illustrative example to introduce the problem (Section 1.1), the model is formalized in a general framework in Section 2 and an appropriate algorithm is proposed in Section 3. Applications to the artificial data of Section 1.1 and to real mobility data are presented in Section 4 and Section 5, respectively, to illustrate the usefulness and effectiveness of the proposal, also in comparison with the Generalized INDCLUS [20]. Finally, Section 6 summarizes the findings and concludes with directions for future developments.

1.1. Illustrative Example: Artificial Data

Before discussing the model in detail, an illustrative example of the heterogeneous three-way asymmetric data we are dealing with is presented.

Let us consider a three-way array of asymmetric dissimilarity data pertaining to pairwise exchanges between N=9 objects measured at H=3 occasions. The three matrices X[sub.h] (h=1,2,3) in Figure 1, containing the exchanges of each of the three occasions, have been artificially generated from model (4), which will be fully formalized in Section 2, and by assuming three clusters of objects, namely, C[sub.1]={a,b,c,d}, C[sub.2]={e,f}, and C[sub.3]={g,h,i}.

Exchanges are generally asymmetric: for example, the exchange from a to e is different from the exchange from e to a in each occasion. Furthermore, the exchanges from a to all other objects are greater than the exchanges from all other objects to a in occasions 1 and 3, while the opposite is true for occasion 2.

In addition, using the Gower decomposition [22], each matrix X[sub.h] (h=1,2,3) can be decomposed into its symmetric and skew-symmetric components S[sub.h] and K[sub.h], respectively, as shown in Figure 2.

Let us recall the Gower decomposition [22] of any square matrix X[sub.h] (h=1,…,H), which can be uniquely decomposed into the sum of a symmetric matrix S[sub.h] and a skew-symmetric matrix K[sub.h], both of size (N×N) and orthogonal to each other (i.e., trace(S[sub.h]K[sub.h])=0), (1)X[sub.h]=S[sub.h]+K[sub.h]=1/2(X[sub.h]+X[sub.h][sup.'])+1/2(X[sub.h]-X[sub.h][sup.']) , (h=1,…,H) , as elementary linear algebra can easily prove. The entry s[sub.ijh]?S[sub.h] represents the average amount of the exchange between objects i and j at occasion h (h=1,…,H), while the entry k[sub.ijh]?K[sub.h] represents the imbalance between i and j, i.e., the amount by which k[sub.ijh] differs from its mean s[sub.ijh] (i,j=1,…,N and h=1,…,H). Thus, each element of the skew-symmetric matrix K[sub.h] is, by definition, such that k[sub.ijh]=-k[sub.ijh], and conveys information about the direction of the exchange.

According to this decomposition, every exchange between a pair of objects in matrix X[sub.h] (h=1,2,3), for example, x[sub.ae1]=66.2 from the object a to the object e in the first occasion, can be decomposed into the sum of two entries: (1) the average amount of the exchange between a and e, s[sub.ae1]=34.0 in S[sub.1], and (2) the imbalance between the exchanges from a to e and from e to a, k[sub.ae1]=32.2 in K[sub.1], i.e., the amount by which the exchange between a and e differs from its mean s[sub.ae1].

Based on the Gower decomposition, it is also possible to measure the percentage of asymmetry of each matrix X[sub.h] as the squared ratio between the Frobenius norm of its skew-symmetric component K[sub.h] and the Frobenius norm of X[sub.h], i.e., ?Kh?2/?Xh?2100, which is a measure in [0,100].

The percentage of asymmetry of each of the matrices in Figure 1 is not negligible (15.5%, 17.8%, and 17.0%), which confirms that the structure of the data is characterized by the direction and magnitude of the exchanges, which are both relevant and cannot be ignored.

The underlying pattern in the data is evident from Figure 1. In each heatmap, different shades of blue represent different magnitudes of exchange between pairs of objects in different clusters, with darker shades representing higher magnitudes. With a view of searching for origin/destination clusters, we are interested in modeling the exchange between clusters, while the exchange within clusters is not of interest here. Thus, the diagonal blocks are left white for better clarity. The heatmap of each data matrix shows three “blue” off-diagonal blocks, corresponding to the three clusters C[sub.1]={a,b,c,d}, C[sub.2]={e,f}, and C[sub.3]={g,h,i}, each with a common pattern across matrices.

Furthermore, the same clustering structure can be identified in both symmetric and skew-symmetric components S[sub.h] and K[sub.h] across all occasions (Figure 2).

By analyzing the symmetric and skew-symmetric components of each data matrix, it becomes easier to disclose the pattern of the data and determine which groups of objects serve primarily as the origin or destination of the exchanges.

Objects in the same cluster have a common behavior towards objects in different clusters, in terms of outgoing and incoming exchanges, but the magnitude and direction of the exchanges change across occasions. As an example, if we look at the first and third occasions, it is clear that the outgoing exchanges from all objects in C[sub.2] directed to objects in C[sub.1] and C[sub.3] are smaller than the incoming exchanges. This results in negative imbalances for the exchanges from all objects in C[sub.2] to objects in either C[sub.1] or C[sub.3]. Cluster C[sub.2] can therefore be considered as the origin cluster for these two occasions. Conversely, in the second occasion, C[sub.2] serves as a destination cluster, as its outgoing amounts towards C[sub.1] and C[sub.3] are greater than its incoming ones, resulting in positive imbalances.

By inspecting all the symmetric and skew-symmetric components, we may also note that both the objects b and c (in cluster C[sub.1]) present a common behavior across occasions: they have almost symmetrical exchanges with the objects g and h (in cluster C[sub.3]) and small imbalances with all other objects belonging to either cluster C[sub.2] or C[sub.3].

In order to have a better understanding of the underlying patterns and identify the common behaviors of the objects across the different occasions, the model formalized in Section 2 is fitted to the artificial data and the results analyzed in Section 4, together with the results from the Generalized INDCLUS model [20] for comparison.

2. The Model

Let us assume that X[sub.h] (h=1,…,H) is a square asymmetric matrix where the element x[sub.ilh] represents the pairwise dissimilarity between the objects i and l (i,l=1,…,N) observed at the occasion h (h=1,…,H) and is generally different from x[sub.lih].

The model proposed here aims at clustering the N objects by accounting for both the symmetric and the skew-symmetric effects from the decomposition (1) of the observed asymmetries X[sub.h] (h=1,…,H).

In particular, the data are assumed to subsume two clustering structures that are common to all occasions: the first one defines a standard partitioning of all objects fitting the average amount of the exchanges; the second one, which fits the imbalances, defines an “incomplete” partitioning of the objects, where some of them are allowed to remain unassigned.

Specifically, the clustering structure consists of two nested partitions into J clusters {C[sub.1],…,C[sub.j],…,C[sub.J]} and {G[sub.1],…,G[sub.j],…,G[sub.J]}, such that

* Every object belongs to one and only one non-empty cluster C[sub.j] (j=1,…,J);

* If object i belongs to cluster C[sub.j], it can either belong to cluster G[sub.j] or remain unassigned to any cluster of the partition {G[sub.1],…,G[sub.j],…,G[sub.J]}.

The partition {C[sub.1],…,C[sub.j],…,C[sub.J]} is referred to as a complete partition because every object must be assigned to some cluster C[sub.j], while {G[sub.1],…,G[sub.j],…,G[sub.J]} is called an incomplete partition because a number of N[sub.0] (N[sub.0]=N) out of N objects are allowed to remain unassigned to any cluster. Note that the complete and the incomplete partitions are common to all occasions and linked together, the latter being constrained to be nested within the former (G[sub.j]?C[sub.j] for j=1,…,J).

The complete partition is uniquely identified by an (N×J) binary membership matrix U=[u[sub.ij]] (u[sub.ij]={0,1} for i=1,…,N and j=1,…,J and ?[sub.j=1][sup.J]u[sub.ij]=1 for i=1,…,N), where u[sub.ij]=1 if object i belongs to cluster C[sub.j], and u[sub.ij]=0 otherwise.

The incomplete partition is identified by an (N×J) binary membership matrix V=[v[sub.ij]] (v[sub.ij]={0,1} for i=1,…,N and j=1,…,J), where v[sub.ij]=u[sub.ij] (i=1,…,N and j=1,…,J); i.e., any object i can either remain unassigned to any cluster or belong to cluster G[sub.j] if it belongs to cluster C[sub.j] in the complete partition.

Hereafter, I[sub.N] denotes the identity matrix of size N; 1[sub.AB] and 1[sub.A] denote the matrix of size (A×B) of all ones and the column vector with A ones, respectively; I˜[sub.N]=(1[sub.NN]-I[sub.N]) is the (N×N) matrix of ones except for the zeros on the main diagonal; and Y_=[Y[sub.1],…,Y[sub.H]] is the (M×PH) augmented matrix obtained by collecting the H matrices Y[sub.1],…,Y[sub.H] of size (M×P) next to each other.

Let us consider the Gower decomposition (1) of each matrix X[sub.h] (h=1,…,H) into the sum of its symmetric and skew-symmetric components S[sub.h] and K[sub.h], respectively. Both of these components can be modeled by defining two clustering structures that depend on the matrices U and V, respectively, as introduced in Vicari [17] for a two-way asymmetric dissimilarity matrix.

Specifically, the symmetric component S[sub.h] and the skew-symmetric component K[sub.h] for the occasion h (h=1,…,H) are modeled by the two clustering structures introduced in Vicari [16,18] and depend on the common complete and incomplete membership matrices U and V, respectively, as follows: (2)S[sub.h]=UR[sub.h]U˜[sup.']+U˜R[sub.h]U[sup.']+E[sub.hS] , (h=1,…,H) ,(3)K[sub.h]=VT[sub.h]V˜[sup.']-V˜T[sub.h]V[sup.']+E[sub.hK] , (h=1,…,H) , where

* U˜=1[sub.NJ]-U and V˜=1[sub.NJ]-V;

* R[sub.h]=diag(r[sub.h]) and T[sub.h]=diag(t[sub.h]) with r[sub.h]=[r1h,…,rJh][sup.'] and t[sub.h]=[t1h,…,tJh][sup.'] being the occasion-specific weight vectors of size J associated with the clusters of the complete and incomplete partition, respectively;

* The error terms E[sub.hS] and E[sub.hK] represent the parts of S[sub.h] and K[sub.h] not accounted for by the model, respectively.

For identifiability reasons, any matrix (VT[sub.h]) is constrained to sum to zero: i.e., 1[sub.N][sup.'](VT[sub.h])1[sub.J]=0 (h=1,…,H).

Models (2) and (3) can be combined and plugged into (1) to specify the model accounting for the asymmetric dissimilarities between clusters at the occasion h, as follows:(4)X[sub.h]=S[sub.h]+K[sub.h]+b[sub.h]I˜[sub.N]+E[sub.h] =[UR[sub.h]U˜[sup.']+U˜R[sub.h]U[sup.']]+[VT[sub.h]V˜[sup.']-V˜T[sub.h]V[sup.']]+b[sub.h]I˜[sub.N]+E[sub.h] , (h=1,…,H), where b[sub.h] is an additive constant term and the general error term E[sub.h] represents the part of X[sub.h] not accounted for by the model.

Models (2) and (3) can be expressed in compact notation in terms of S_=[S[sub.1],…,S[sub.H]] and K_=[K[sub.1],…,K[sub.H]], which denote the (N×NH) augmented matrices obtained by collecting the H matrices S[sub.h] and K[sub.h] next to each other, respectively, as follows: (5)S_=(1[sub.H][sup.']?U)R(I[sub.H]?U˜[sup.'])+(1[sub.H][sup.']?U˜)R(I[sub.H]?U[sup.'])+E_[sub.S] ,(6)K_=(1[sub.H][sup.']?V)T(I[sub.H]?V˜[sup.'])-(1[sub.H][sup.']?V˜)T(I[sub.H]?V[sup.'])+E_[sub.K] , where ? denotes the Kronecker product, and R and T are the two (HJ×HJ) diagonal matrices with the HJ-vectors r=r1',…,rh',…,rH'[sup.'] and t=t1',…,th',…,tH'[sup.'] as main diagonals, respectively, E_[sub.S]=[E[sub.1S],…,E[sub.hS],…,E[sub.HS]] and E_[sub.K]=[E[sub.1K],…,E[sub.hK],…,E[sub.HK]].

Recall that, given any two matrices A=[a[sub.ij]] and B of sizes (N×J) and (M×P), respectively, the Kronecker product between A and B is the (NM×JP) matrix, as follows:A?B=[a[sub.11]B?a[sub.1J]B? ?a[sub.N1]B?a[sub.NJ]B].

Finally, model (4) can be expressed in compact notation in terms of the augmented matrix X_=[X[sub.1],…,X[sub.H]] by combining models (5) and (6) as follows: (7)X_=S_+K_+(b[sup.']?I˜[sub.N])+E_ =(1[sub.H][sup.']?U)R(I[sub.H]?U˜[sup.'])+(1[sub.H][sup.']?U˜)R(I[sub.H]?U[sup.'])+ (1[sub.H][sup.']?V)T(I[sub.H]?V˜[sup.'])-(1[sub.H][sup.']?V˜)T(I[sub.H]?V[sup.'])+ (b[sup.']?I˜[sub.N])+E_ , where b=b1,…,bh,…,bH[sup.'] and E_=[E[sub.1],…,E[sub.h],…,E[sub.H]].

It is important to note that, in the model, in addition to a common clustering structure, occasion-specific weight vectors r[sub.h] and t[sub.h] are assumed to account for the heterogeneity of the occasions. These weights allow for measuring the extent to which exchanges vary across occasions, providing quantifications of the exchanges between clusters at the occasion h in terms of magnitude and direction.

3. The Algorithm

In model (4), the complete and the incomplete membership matrices U and V, the weight vectors r[sub.h] and t[sub.h], and the constants b[sub.h] (h=1,…,H) can be estimated by solving the following least-squares fitting problem:(8) min F(U,V,r[sub.h],t[sub.h],b[sub.h])= ?h=1H?Xh-URhU˜'+U˜RhU'-VThV˜'-V˜ThV'-bhI˜N?2/?h=1H?Xh?2 subject to (9)u[sub.ij]={0,1} (i=1,…,N;j=1,…,J) and ?j=1Ju[sub.ij]=1 (i=1,…,N) , (10)v[sub.ij]={0,1} (i=1,…,N ;j=1,…,J) and v[sub.ij]=u[sub.ij] (i=1,…,N) , (11) 1[sub.N][sup.'](VT[sub.h])1[sub.J]=0 (h=1,…,H) .

Problem (8), subject to constraints (9)–(11), can be reformulated in compact form in terms of model (7) as follows:(12) min F(U,V,R,T,b)= 1/?X_?2?X_-(1[sub.H][sup.']?U)R(I[sub.H]?U˜[sup.'])-(1[sub.H][sup.']?U˜)R(I[sub.H]?U[sup.'])- (1[sub.H][sup.']?V)T(I[sub.H]?V˜[sup.'])+(1[sub.H][sup.']?V˜)T(I[sub.H]?V[sup.'])-(b[sup.']?I˜[sub.N])?[sup.2]= ?S_-1H'?URIH?U˜'-1H'?U˜RIH?U'-b'?I˜N?2/?X_?2+ ?K_-1H'?VTIH?V˜'+1H'?V˜TIH?V'?2/?X_?2= F[sup.S](U,R,b)+F[sup.K](V,T) , where the equivalence is due to the orthogonality of S_ and K_.

The equivalent constrained optimization problems (8) and (12) can be solved by using an Alternating Least-Squares (ALS) algorithm, which alternates the estimation of a set of parameters while maintaining all the others fixed as detailed below.

After an initialization step in which all parameters satisfying the constraints are chosen, the algorithm alternates between two main steps.

In the first step, in order to ensure that the relative loss function (12) is non-increasing, the two membership matrices U and V are jointly updated row by row in N substeps by solving assignment problems for the different rows of U and V satisfying the constraints (9) and (10). Specifically, for a given row i, setting u[sub.ij]=1 for j=1,…,J implies that either v[sub.ij]=u[sub.ij] or v[sub.ij]=0; i.e., object i in matrix V can either be assigned to the same cluster j as in matrix U or remain unassigned. For the row i and given the remaining rows of U and V, all possible 2J assignments of the object i are considered: either to the corresponding clusters C[sub.j] and G[sub.j] (u[sub.ij]=1 and v[sub.ij]=1) or to the cluster C[sub.j] only (u[sub.ij]=1 and v[sub.ij]=0). For each of them, the weight vectors r[sub.h] and t[sub.h] (h=1,…,H) are also estimated as optimal solutions of constrained regression problems. Finally, by evaluating the relative loss function (12) for all possible potential assignments and selecting the one corresponding to the minimum loss value, the assignment of the object i in U and V is chosen. Note that the relative loss function cannot increase at each substep, since the whole space of feasible solutions for both U and V is explored for each object.

In the same step, the weight vectors r[sub.h] and t[sub.h] are estimated as solutions of constrained regression problems for each possible choice of the different rows of U and V, respectively.

In the second step, the constant b is then estimated by successive residualizations of the three-way data matrix.

The two main steps are alternated and iterated until convergence. The relative loss function (12) does not increase at each step, and the algorithm stops when the loss decreases less than a fixed arbitrary positive and small threshold.

In order to increase the chance of finding the global minimum, the best solution over different random starting parameters is retained.

Moreover, in order to estimate the matrices R and T, model (7) is reformulated as a regression problem with respect to the unknown vectors r and t, as follows:(13)x_=s_+k_+(b?i˜)+e_ =Q[sub.U] r+Q[sub.V] t+(b?i˜)+e_ , where

– x_ is the column vector of size HN[sup.2] of the vectorized matrix X_, i.e., x_=vec(X_)=[x111,…,xN11,…,x11h,…,xN1h,…,x1NH,…,xNNH][sup.'];

– s_=vec(S_) and k_=vec(K_) are the column vectors of size HN[sup.2] of the vectorized matrices S_ and K_, respectively;

– Q[sub.U]=[(I[sub.H]?U˜)|?|(1[sub.H][sup.']?U)]+[(I[sub.H]?U)|?|(1[sub.H][sup.']?U˜)] is a matrix of size (HN[sup.2]×HJ), where |?| denotes the Khatri–Rao product [23,24];

– Q[sub.V]=[(I[sub.H]?V˜)|?|(1[sub.H][sup.']?V)]-[(I[sub.H]?V)|?|(1[sub.H][sup.']?V˜)] is a matrix of size (HN[sup.2]×HJ);

– i˜ is the column vector of size N[sup.2] of the vectorized matrix I˜[sub.N];

– e_=vec(E_) is the column vector of size HN[sup.2] of the error term.

Recall that, given any the two matrices A and B with the same number J of columns, the Khatri–Rao product of A and B is the column-wise Kronecker product, i.e., A|?|B=(a[sub.1]?b[sub.1],…,a[sub.j]?b[sub.j],…,a[sub.J]?b[sub.J]), where a[sub.j] and b[sub.j] are the j-th (j=1,…,J) column of A and B, respectively.

Therefore, by taking into account (13), the relative loss function (12) becomes as follows: (14) F(U,V,r,t,b)=F[sup.S](U,r,b)+F[sup.K](V,t) =?s_-QUr-(b?i˜)?2/?x_?2+?k_-QVt?2/?x_?2 .

A detailed description of the steps of the algorithm, implemented in MATLAB R2023a, is given below.

Initialization step.

Initial estimates of the parameters U^, V^, r^, t^, and b^ are chosen randomly or in a rational way, but they are required to satisfy the set of constraints (9)–(11).

Step 1. Updating the membership matrices U and V and weight-vectors r and t. (see Algorithm 1)

Given the current estimates of b^, U, and V, the weight vectors r, and t are estimated by minimizing (14) subject to constraints (9)–(11).

The loss function (14) is minimized sequentially for the different rows of U and V by solving N assignment problems. Finally, the column sums of the estimated U^ are checked to avoid empty clusters.

Furthermore, the weight vectors r and t are estimated in two nested substeps as follows. Given the current U^ and b^, for every possible binary choice for the different rows of U, the vector r is estimated by solving the following regression problem:(15)F[sup.S](r;U^,b^)=?s_-QU^r-(b^?i˜)?2/?x_?2 .

Similarly, given the current V^, for every admissible choice for the different rows of V, the weight vector t is obtained as the solution of the following constrained regression problem:(16)F[sup.K](t;V^)=?k_-QV^t?2/?x_?2 , subject to constraints (11).

The pseudocode for updating the i-th row of U and V, holding all other rows constant, and for updating r and t is as follows.

In the following, let 0 be the J-column vector of zeros; p[sup.(j)] be the J-dimensional vector with all entries equal to 0 except for the j-th one, which is 1; and u[sub.i] and v[sub.i] denote the vectors corresponding to the i-th row of U and V, respectively.

Algorithm 1 Step 1.

begin for i:= 1 to N do for j:= 1 to J do u[sub.i][sup.(j)]=p[sup.(j)]; U[sup.(j)]=u1,…,ui(j),…,uN[sup.']; (17)r[sup.(j)]=QU(j)'QU(j)[sup.-1]Q[sub.U(j)][sup.'] (s_-(b^?i˜)); comment: solution of the regression problem (15) corresponding to the possible assignment of object i to cluster j of the complete partition (U); for w:= 1 to 2 do if w = 1 then v[sub.i][sup.(j,w)]=u[sub.i][sup.(j)] , else if w = 2 then v[sub.i][sup.(j,w)]=0; end; V[sup.(j,w)]=v1,…,vi(j,w),…,vN[sup.']; N[sub.V]=(1[sub.N][sup.']V[sup.(j,w)]1[sub.J]); comment: number of assigned objects in the incomplete partition; (18)t˜[sup.(j,w)]=QV(j,w)'QV(j,w)[sup.-1]Q[sub.V(j,w)][sup.'] k_; comment: solution of the regression problem (16) corresponding to the possible assignment of object i to the cluster j of the incomplete partition (V); for h:= 1 to H do (19)t[sub.h][sup.(j,w)]=V(j,w)[sup.+](V[sup.(j,w)]t˜[sup.(j,w)]-1N'V(j,w)t˜(j,w)/NVV[sup.(j,w)]1[sub.J]); comment: constraint (11) is imposed; V(j,w)[sup.+] is the Moore–Penrose inverse of V[sup.(j,w)]; end t[sup.(j,w)]=t1(j,w)',…,th(j,w)',…,tH(j,w)'[sup.']; f[sup.(j,w)](u[sub.i][sup.(j)],r[sup.(j)],v[sub.i][sup.(j,w)],t[sup.(j,w)])=F[sup.S](U[sup.(j)],r[sup.(j)];b^)+F[sup.K](V[sup.(j,w)],t[sup.(j,w)]); end end (l,g)=argmin1=j=J(argminw?{1,2}(f[sup.(j,w)])); u^[sub.i]=p[sup.(l)]; r^=r[sup.(l)]; If g = 1 then v^[sub.i]=u^[sub.i], t^=t[sup.(l,1)], else if g = 2 then v^[sub.i]=0, t^=t[sup.(l,2)]; end; end

Step 2. Updating constant b.

Given the current estimates of U^, V^, r^, and t^, the estimate of b is given by the following:(20)b^=(IH?i˜')(IH?i˜)[sup.-1](I[sub.H]?i˜[sup.'])(s_-Q[sub.U^]r^) , where the inverse always exists, since [(I[sub.H]?i˜[sup.'])(I[sub.H]?i˜)] is a full-rank diagonal matrix of size H with diagonal elements all equal to N(N-1).

Stopping rule.

The relative loss function value is computed for the current values of U^, V^, r^, t^, and b^ and since F(U^,V^,r^,t^,b^) is bounded from below, it converges to a point that is expected to be at least a local minimum. When the loss function (14) has not decreased considerably with respect to a tolerance value, the process is assumed to be converged. Otherwise, steps 1 and 2 are repeated in turn.

Meaning of the Parameter Estimates

In order to evaluate the meaning of the estimated weights r^ of the complete partition, let us consider the membership matrix U^=[u^ij][sub.(i=1,…,N;j=1,…,J)] that uniquely identifies the clusters of the complete partition {C[sub.1],…,C[sub.j],…,C[sub.J]}.

From Equation (17), the estimated weight r^[sub.jh] of the cluster C[sub.j] in the h-th occasion is the average amount of the exchanges between objects in cluster C[sub.j] and objects in clusters different from C[sub.j], corrected for the mean of the average amounts between all clusters different from C[sub.j]. Hence, large (small) values of r^[sub.jh] indicate clusters characterized by large (small) amounts of exchanges on average.

Similarly, let us consider the incomplete partition {G[sub.1],…,G[sub.j],…,G[sub.J]} identified by the matrix V^=[v^ij][sub.(i=1,…,N;j=1,…,J)]. Then, from (18) imposing the constraint (11), the weight t^[sub.jh] of the cluster G[sub.j] in the occasion h is as follows:t^[sub.jh]=1/N×NGj?i=1N?l=1Nk[sub.ilh]v^[sub.ij]v^˜[sub.lj] , (j=1,…,J;h=1,…,H), where N[sub.Gj] is the number of objects assigned to the cluster G[sub.j].

Note that t^[sub.jh] is the average imbalance from all objects in the cluster G[sub.j] towards all objects in clusters other than G[sub.j], at the occasion h, corrected for the average imbalance originating from all clusters different from G[sub.j]. Therefore, a positive (negative) weight t^[sub.jh] qualifies the cluster G[sub.j] as a “destination” (“origin”) cluster of the exchanges at the occasion h, and the objects belonging to such a cluster have a similar pattern in terms of exchanges directed towards the other clusters.

Given the occasion h and due to (3), all objects belonging to the same cluster G[sub.j] have the same weight t^[sub.jh], and the weighted sum of them over all clusters is zero due to constraint (11). Constraint (11) is necessary to guarantee the identifiability of the model and the uniqueness of the solution: this is because the weights are defined by difference. In a general and widely applicable framework, such a choice implies that the model defines a closed exchange system in the sense that the total imbalance between clusters within each occasion is zero. Note that, in specific contexts, a value other than zero could generally be chosen to handle appropriate assumptions. Only (19) in the algorithm should be modified accordingly.

Furthermore, the objects that remain unassigned to any cluster of the incomplete partition are actually objects that generate (almost) zero mutual imbalances and, equivalently, almost symmetric exchanges in all occasions.

The constant term b^[sub.h] is assumed to be added to all dissimilarities in the h-th occasion and, thus, only affects the average amounts of exchanges between objects (symmetric component S[sub.h]). Therefore, the additive constant b^[sub.h] in (20) represents the baseline average dissimilarity, independent of any clustering and direction, and plays the same role as the intercept of a linear regression model.

Finally, it is worth noting that, for each occasion h, model (7) accounts for the between-cluster effects, while the exchanges within clusters are fitted only by the constant term b^[sub.h], which actually also represents the average exchange between objects within clusters.

4. Illustrative Example: Results

In order to detect the underlying clustering structure common to all occasions and identify “origin” and “destination” clusters, the model proposed here was fitted to the artificial data of Figure 1, together with the Generalized INDCLUS model [20] for comparison.

Let us briefly recall the generalization of the INDCLUS model [20] to fit three-way asymmetric similarity data Y[sub.h] (h=1,…,H). The model assumes that there exist two sets of J overlapping clusters (i.e., two coverings) of the same set of N objects in row and column, and that they are common to all occasions, namely, P=[p[sub.ij]] and Q=[q[sub.ij]], where p[sub.ij] and q[sub.ij] assume values in {0,1} (for i=1,…,N and j=1,…,J); i.e., each object is allowed to belong to more than one cluster or none at all. In addition, a single set of weights is assumed for both set clusters, but different for each occasion. The model can be written as follows:(21)Y[sub.h]=PW[sub.h]Q[sup.']+c[sub.h]1[sub.NN]+E[sub.h][sup.*] , (h=1,…,H) , where W[sub.h] is the non-negative diagonal weight matrix of order J for the occasion h, c[sub.h] is a real-valued additive constant for the occasion h, and E[sub.h][sup.*] is the error term.

Since model (21) fits similarities, the original artificial dissimilarities in [0,100] of Figure 1 were simply converted into similarities by taking Y[sub.h]=100-X[sub.h] (h=1,2,3).

The best solution in J=3 clusters was retained over 100 random starts of the algorithm.

None of the two best coverings in three overlapping clusters obtained by the Generalized INDCLUS model (relative loss function equal to 0.0228) identify the true generated partition of the objects. The common coverings P and Q for the sets of objects in the rows and in the columns, respectively, consist of three row clusters, C[sub.1][sup.Ir]={a,b,c,d}, C[sub.2][sup.Ir]={e,f}, and C[sub.3][sup.Ir]={b,c,g,h,i}, and three column clusters, C[sub.1][sup.Ic]={b,c,e,f}, C[sub.2][sup.Ic]={a,b,c,d,e,f,g,h,i}, and C[sub.3][sup.Ic]={a,b,c,d,g,h,i}. The first covering identifies row clusters of objects with similar outgoing exchanges to the column clusters, which contain groups of objects with similar incoming exchanges.

The estimated weights of the three clusters of both coverings in each occasion and the constants are reported in Table 1. The constants represent the average exchange between all objects within each occasion, regardless of the direction. Thus, the second occasion has the highest average level of exchange (c[sub.2]=59.7) and the first occasion the lowest (c[sub.2]=49.6).

Table 1 displays the weights of the outgoing row clusters and incoming column clusters, where a large positive weight for any two corresponding clusters C[sub.j][sup.Ir] and C[sub.j][sup.Ic] (j=1,2,3) qualifies C[sub.j][sup.Ir] as the origin of the exchanges directed to the destination C[sub.j][sup.Ic]. Therefore, in the second occasion, C[sub.1][sup.Ir]={a,b,c,d} is an origin cluster to C[sub.1][sup.Ic]={b,c,e,f}. In addition, it can be noted that, in occasion 2, Generalized INDCLUS fails to identify the true behavior of objects {g,h,i} in C[sub.3] as destination from C[sub.1]={a,b,c,d} and origin to C[sub.2]={e,f} (Figure 1). Instead, objects {g,h,i} are estimated here to have mutual exchanges with all other objects due to their membership in several column clusters.

Moreover, in the first and third occasions, the clusters C[sub.2][sup.Ir]={e,f} and C[sub.3][sup.Ir]={b,c,g,h,i} are origins toward C[sub.2][sup.Ic], which contains all objects, and C[sub.3][sup.Ic]={a,b,c,d,g,h,i}, respectively.

We can observe that, in the special case when the clusters from Generalized INDCLUS form a partition (they do not overlap), model (21) reduces to estimating only the exchanges from any C[sub.j][sup.Ir] row cluster to its corresponding C[sub.j][sup.Ic] column cluster. Conversely, in the presence of overlapping clusters, objects common to different clusters also contribute to the estimation of the exchange between different clusters. That is why, in Table 1, due to the large cluster overlap, it is very cumbersome to extract the directions of the exchange from any row cluster C[sub.j][sup.Ir] to any other column cluster C[sub.j][sup.Ic]. It becomes necessary to look at the whole full-size (N×N) estimated matrices and analyze the estimated exchanges between pairs of objects.

For the sake of clarity, we observe that, in this well-structured artificial situation, the lowest weights are exactly zero, which is generally not the case.

Model (4) proposed here was fitted to the artificial data of Figure 1, and the best solution in J=3 clusters was retained over 100 random starts of the algorithm.

The best resulting partition (relative loss equal to 0.0026) correctly identifies the complete partition from the symmetric components consisting of clusters C[sub.1]={a,b,c,d}, C[sub.2]={e,f}, C[sub.3]={g,h,i}, and the incomplete partition formed by clusters G[sub.1]={a,d}, G[sub.2]={e,f}, G[sub.3]={i}, which are nested in the corresponding clusters of the former.

The objects b, c, g, and h remain unassigned to any cluster in the incomplete partition from the skew-symmetric components due to their almost zero mutual imbalances and almost symmetric exchanges at each occasion.

Table 2 reports the estimates of the weight vectors r^ and t^ and the constant b^. In addition, for each occasion h, the estimated between-cluster components, both the symmetric (S^[sub.h]) from the complete partition and the skew-symmetric (K^[sub.h]) from the incomplete partition, are shown in Figure 3.

The estimated constants b^[sub.h] (Table 2) represent the baseline average exchange level at each occasion, regardless of any clustering, and take on different values in the occasions: the highest (b^[sub.1]=31.8) in the first, while the third has the lowest (b^[sub.3]=19.7).

Given the occasion h, the weight r^[sub.jh] of the cluster C[sub.j] (j=1,2,3) represents the average exchange between the cluster C[sub.j] and every other cluster, in addition to the additive constant. Thus, from the occasion-specific weights r^[sub.h] (h=1,2,3) in Table 2, we observe that, in occasion 1, C[sub.1] subsumes the highest average exchange with the other two clusters (r^[sub.11]=8.8), while in occasion 2, the highest average exchange concerns C[sub.3] (r^[sub.32]=9.7). In contrast, in occasion 3, cluster C[sub.3] has the lowest weight (r^[sub.33]=-1.6), while C[sub.2] has the highest (r^[sub.23]=14.5). This is consistent with what can be seen from the data in Figure 2.

In addition, the different role of each cluster can be identified from the occasion-specific weights t^[sub.h] (h=1,2,3) of the clusters of the incomplete partition, which allow for qualifying origins and destinations. Since each t^[sub.jh] represents the weighted average imbalance at the occasion h from the cluster G[sub.j] to all other clusters, in Table 2, the negative weight of G[sub.1] in the second occasion (t^[sub.12]=-13.2) qualifies G[sub.1] as an origin cluster, while it acts as a destination cluster in occasions 1 and 3, where its corresponding weights are both positive (t^[sub.11]=16.2 and t^[sub.13]=14.1, respectively). The reverse is true for both clusters G[sub.2] and G[sub.3].

As an example, we can see from Figure 3 that the estimated symmetric part of the exchange between the clusters C[sub.1] and C[sub.2] in occasion 1 is 34.6, which is the sum of the estimated weights of C[sub.1] and C[sub.2] (r^[sub.11]=8.8 and r^[sub.21]=-6.0, respectively) plus the constant value (b^[sub.1]=31.8), i.e., (s^[sub.121]+b^[sub.1])=8.8-6.0+31.8=34.6.

Moreover, the corresponding estimated imbalance between G[sub.1] and G[sub.2] yields k^[sub.121]=31.5, which is the difference of the estimated weights of the clusters G[sub.1] and G[sub.2] (t^[sub.11]=16.2 and t^[sub.21]=-15.3, respectively), i.e., k^[sub.121]=16.2+15.3=31.5.

Finally, the estimated between-cluster dissimilarity between C[sub.1] and C[sub.2] in the first occasion is the sum of (s^[sub.121]+b^[sub.1]) and k^[sub.121], hence x^[sub.121]=66.1.

Remark 1.

From Figure 3, we can easily derive the summary of what emerges from the detailed results, since the full-size exchange matrices between objects (Figure 2) are synthesized into reduced-size exchange matrices between clusters with a very limited loss of information. This makes it possible to show the role played by each cluster of objects as an origin or destination of exchange, exactly reflecting the original data in Figure 1.

5. Application to Student Mobility Data

The data analyzed in this application have been taken from the OECD (Organization for Economic Co-operation and Development) Education Statistics Database, which collects data annually on international student exchanges in tertiary education.

The three-way data array consists of H=5 asymmetric matrices of exchanges of international student mobility observed in N=20 countries (objects) from 2016 to 2020 (H=5 occasions). The data report the number of international tertiary students who received their prior education in the origin country (the rows of each year matrix) enrolled in the host country (the columns of each year matrix). The countries are the twenty founding members of the OECD: Austria (AT), Belgium (BE), Canada (CA), Denmark (DK), France (FR), Germany (DE), Greece (EL), Iceland (IS), Ireland (IE), Italy (IT), Luxembourg (LU), Netherlands (NL), Norway (NO), Portugal (PT), Spain (ES), Sweden (SE), Switzerland (CH), Turkey (TR), United Kingdom (UK), and United States of America (US).

Between 2016 and 2020, the total population of mobile international students in the twenty founding members of the OECD experienced a remarkable growth of 16.8%. Specifically, the number jumped from 555,920 students in 2016 to 649,376 students in 2020, reflecting a significant increase in the number of tertiary students enrolled abroad. As highlighted in [25], the COVID-19 pandemic had a very uneven impact on international student exchanges across countries during 2019–2020. Nevertheless, the percentage of tertiary students enrolled abroad increased in several of the twenty OECD countries and remained unchanged in many others. Overall, there were about three international students for every national student studying abroad in OECD countries in 2016–2020, but this ratio is equal to or greater than ten in the United Kingdom and the United States [25,26]. In contrast, Luxembourg is among the twenty OECD founder countries with the lowest ratio of international students to national students abroad [25].

The aim here is to explore both the magnitude and the direction of the international mobility of the tertiary students to identify clusters of countries that are the main origins or destinations of the student exchanges across 5 years (2016–2020).

First, the data were transformed by calculating for each country the percentage share of outgoing mobile students moving to each of the other countries for tertiary education.

5.1. Mobility Data: Results from Generalized INDCLUS

The Generalized INDCLUS model (21) was first fitted directly to the original similarity data.

The algorithm was run by varying the number of clusters J from 2 to 7, and the best solution in 100 runs from different random starting partitions was retained to prevent from falling into local minima.

The choice of the optimal number of clusters was determined by looking at the decrease in the relative loss function as J increases. From the scree plot (Figure 4a, the partition in J=5 clusters was chosen as the best solution for Generalized INDCLUS.

Table 3 shows the row and column clusters, while the occasion-specific weights are reported in Table 4.

All countries belong to at least one cluster of the row covering, but while Iceland and the United Kingdom belong to only one cluster (C[sub.2][sup.Ir] and C[sub.3][sup.Ir], respectively), Austria, Canada, Ireland, Switzerland, and the United States belong to three clusters. The strong overlap of the row clusters indicates a very heterogeneous situation of student mobility with, in principle, many countries from which students leave.

On the other hand, only nine countries belong to at least one column cluster, while the remaining countries are not assigned to any cluster; i.e., they correspond to zero row profiles in the membership matrix Q. This reveals a greater concentration of incoming flows in a few countries than that of outgoing flows from the countries of origin.

The cluster weights of the two coverings (Table 4) are quite similar over time, indicating that the mobility pattern is quite stable over the years considered.

Every year, student mobility is directed from almost all European countries (C[sub.1][sup.Ir] and C[sub.2][sup.Ir]) to two main European destinations (Germany and UK), as well as to the US from Canada and Northern Europe.

From the solution, we can say, for example, that Norway and Sweden, together in cluster C[sub.2][sup.Ir] and separately in clusters C[sub.1][sup.Ir] and C[sub.3][sup.Ir], respectively, have outgoing flows mainly to Denmark, UK, and the US. Swedish students also move to France and the Netherlands, while Norwegian students to Germany. Swedish and Norwegian mobility to all other countries results to being reciprocal to the same extent.

5.2. Mobility Data: Results from the Proposed Model

In order to fit our proposed model that fits dissimilarity data, the proportions of outgoing students in each year were converted to pairwise dissimilarities by taking their complements to 100.

As before, the algorithm was run with J varying from 2 to 7, and the best solution in 100 runs from different random starting partitions was retained. The scree plot of the loss values in Figure 4b shows an elbow at the partition with J=6 clusters selected for analysis.

The complete partition and the incomplete partition estimated from the symmetric and skew-symmetric components of the exchanges, respectively, are reported in Table 5, where countries not assigned to the corresponding clusters of the incomplete partition are indicated in italics.

The six clusters are shown in Figure 5, where different colors indicate different clusters and the unassigned countries of the incomplete partition are shown in light colors on the color scale.

The clusters C[sub.1], C[sub.2], and C[sub.6] coincide with G[sub.1], G[sub.2], and G[sub.6], respectively, while six countries (Austria, Belgium, Denmark, France, Netherlands, and Spain) do not belong to any cluster of the incomplete partition because their mutual mobility is basically symmetrical.

The two nested partitions highlight the strength of proximity factors, such as language, historical ties, geographical distance, bilateral relationships, and political framework conditions (e.g., the European Higher Education Area) as key determinants for international student mobility. As an example, the Scandinavian countries in cluster C[sub.3] share the same patterns of international student mobility directed to the other OECD countries.

Austria and Belgium, which belong to different clusters (C[sub.3] and C[sub.4], respectively) but remain unassigned in the incomplete partition, show small mutual mobility exchanges over time (0.7195% in 2016 and 0.8485% in 2020) with minimal imbalances (0.3077% in 2016 and 0.5077% in 2020).

The constants b^[sub.h] (h=1,…,5) increase slightly over the years (Table 6 and Figure 6): this means that since the model has been fitted to the complements to 100 of the share of outgoing students in each year, the baseline average mutual exchange between countries (100-b^[sub.h], h=1,…,5) shows a slight decrease over the years. This also implies that the average mutual mobility within clusters tends to decrease slightly over time, as the baseline fits the within-cluster mobility in each year.

In addition to the baseline exchange (b^[sub.h]), the estimated average annual mobility between clusters of countries depends on the weights r^[sub.h] (Table 6). The average mobility has remained relatively stable over time, with very small increases (i.e., very small decreases in the weights r^[sub.jh] (h=1,…,5), taking into account the transformation of the percentages to obtain dissimilarities) in almost all clusters until 2019. Fluctuations can be observed for some clusters: in 2019, a slight increase in the average mobility was observed for the United Kingdom (r^[sub.6,2019]=-10.892 vs. r^[sub.6,2018]=-10.192), in contrast with a slight decrease for the cluster C[sub.1] with Germany and the United States (r^[sub.1,2019]=-5.473 vs. r^[sub.1,2018]=-6.232). Conversely, the situation is reversed in 2020, probably due to the impact of Brexit.

Each year, the ranking of the clusters based on the increasing weights r^[sub.h] consistently ranks the United Kingdom (C[sub.6]) with the highest average share of reciprocal mobility (regardless of direction), followed by Germany and the United States (C[sub.1]), while Scandinavian countries (C[sub.3]) have the lowest, as can be seen in Figure 7.

As for the directions of the mobility, the analysis of the cluster weights of the incomplete partition (Table 7) reveals a consistent pattern over the years: throughout the 2016–2020 period, the mobility originates from countries in the clusters G[sub.2], G[sub.3], G[sub.4], and G[sub.5] (the weights t^[sub.jh] assigned to these clusters are always negative over time) mainly directed to Germany and the United States (G[sub.1]), as well as to the United Kingdom (G[sub.6]).

In all years, the cluster ranking based on the increasing weights t^[sub.h] places Greece, Ireland, and Turkey (G[sub.2]), followed by Luxembourg and Portugal (G[sub.4]), as the main countries of origin of the international students. As can be seen from the direction of the arrow in Figure 8, the cluster G[sub.5], which includes Canada, Italy, and Switzerland, is an origin cluster of mobile students toward G[sub.1] and G[sub.6] and, to a greater extent, a destination cluster from G[sub.2], G[sub.3], and G[sub.4]. Similarly, the Scandinavian students (G[sub.3]) move to Germany, the US, and the UK (G[sub.1] and G[sub.6]) and to G[sub.5], while the Scandinavian countries have incoming mobility from G[sub.2]. This differs in part from the results of Generalized INDCLUS, which does not capture the imbalance of the international mobility from Greece, Ireland, and Turkey directed to the Scandinavian countries.

The countries in the two clusters C[sub.2] and C[sub.4] have comparable average mobility over time (the green (C[sub.2]) and red (C[sub.4]) lines overlap in Figure 7), but with different levels of imbalances: in fact, the student outflows from Greece, Ireland, and Turkey (G[sub.2]) are always larger than those from Luxembourg and Portugal (G[sub.4]), as can be seen in Figure 8.

English is the most widely spoken language in the globalized world, with one in four people worldwide using it [27]. Not surprisingly, English-speaking countries are always the most attractive study destinations overall: the UK is the top destination in Europe; the United States and Germany are destinations for international students from Europe and Canada and are the source of student mobility to the United Kingdom. Among the top 3 destinations, Germany is the major recipient country in the European Union.

A few countries (clusters G[sub.1] and G[sub.6]) are net “importers” of students; that is, they have more students coming in to study than those leaving to study abroad. In contrast, some clusters of countries, such as G[sub.2] and G[sub.4], are net “exporters” of students.

6. Results and Discussion

In a complex situation where the available data represent asymmetric proximities between pairs of objects measured or observed in different occasions, the proposed model aims to unveil a common clustering structure that accounts for the systematic differences between occasions.

The model proposed here proved to be effective in identifying clusters of objects that share a common pattern of exchange across different occasions, even with a different role as origin or destination. The artificial data and the real application have shown as main results (1) the capability of the model to identify clusters of objects with similar exchange behavior in magnitude and direction (origin/destination clusters) and (2) the flexibility to capture possible differences in the directions of such cluster exchanges across occasions, compared with the Generalized INDCLUS model.

Actually, the goal of Generalized INDCLUS is not to summarize the average exchanges of each cluster, and compared with our model, it provides much more complicated solutions in order to derive the behavior of each cluster because of the overlap between clusters, as can also be seen from the simple and small illustrative example. We can observe that, in the special case where the clusters from Generalized INDCLUS form a partition (they do not overlap), as in our model, it actually accounts for the variability within clusters and is not able to capture the exchanges between clusters, which is the very goal of our proposal. However, as shown in Section 4 and Section 5, we experienced that, in the general case, the clusters resulting from Generalized INDCLUS have (often many) objects in common, which, on the one hand, guarantees flexibility and the possibility to estimate the exchange between pairs of objects belonging to different clusters. On the other hand, this comes at the expense of ease of interpretation and synthesis, since the behavior of objects belonging to the same cluster is not the same regardless of the destination cluster.

Thus, within the scarcity of methods for clustering three-way asymmetric proximity data, our proposal is effective, more parsimonious, and promising when the aim is to obtain a synthesis of the average behavior of exchange between clusters across occasions (which could be useful for policy makers, for example). If, instead, the estimation of any pairwise exchange is the main concern, Generalized INDCLUS might be more flexible in general due to the possible overlap.

Further developments may consider the possibility of assuming a fuzzy clustering, where, instead of a crisp membership of an object to each cluster (u[sub.ij] in {0,1}), a membership degree in [0,1] is allowed. In addition, the inclusion of possible covariates in the model, if available, could also be taken into consideration to better investigate the determinants of the exchanges.

Author Contributions

Conceptualization, L.B. and D.V.; methodology, L.B. and D.V.; software, D.V.; validation, L.B. and D.V.; formal analysis, L.B.; investigation, L.B.; resources, D.V.; data curation, D.V.; writing—original draft preparation, L.B. and D.V.; visualization, L.B.; supervision, D.V.; project administration, D.V. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement

Data are contained within the article, with the exception of the data analyzed in Section 5, which are available on the OECD Education Statistics Database, at https://stats.oecd.org/Index.aspx?DataSetCode=EDU_ENRL_MOBILE (accessed on 1 February 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

References

1. T. Saito; H. Yadohisa, Marcel Dekker: New York, NY, USA, 2005,

2. G. Bove; A. Okada Methods for the analysis of asymmetric pairwise relationships., 2018, 12,pp. 5-31. DOI: https://doi.org/10.1007/s11634-017-0307-9.

3. G. Bove; A. Okada; D. Vicari, Springer Nature Singapore: Singapore, 2021,

4. G.W. Furnas Objects and Their Features: The Metric Representation of Two Class Data., Stanford University: Stanford, CA, USA, 1980,

5. W.S. DeSarbo; G. De Soete On the Use of Hierarchical Clustering for the Analysis of Nonsymmetric Proximities., 1984, 11,pp. 601-610. DOI: https://doi.org/10.1086/208996.

6. W.S. DeSarbo; A.K. Manrai; R.R. Burke A Nonspatial Methodology for the Analysis of Two-Way Proximity Data Incorporating the Distance–Density Hypothesis., 1990, 55,pp. 229-253. DOI: https://doi.org/10.1007/BF02295285.

7. G. De Soete; W.S. DeSarbo; G.W. Furnas; J.D. Carroll The Estimation of Ultrametric and Path Length Trees from Rectangular Proximity Data., 1984, 49,pp. 289-310. DOI: https://doi.org/10.1007/BF02306021.

8. W.S. DeSarbo GENNCLUS: New models for general nonmetric clustering analysis., 1982, 47,pp. 449-475. DOI: https://doi.org/10.1007/BF02293709.

9. L. Hubert Min and max hierarchical clustering using asymmetric similarity measures., 1973, 38,pp. 63-72. DOI: https://doi.org/10.1007/BF02291174.

10. H. Fujiwara Methods for Cluster Analysis Using Asymmetric Measures and hom*ogeneity Coefficient., 1980, 7,pp. 12-21.

11. H. Yadohisa Formulation of Asymmetric Agglomerative Hierarchical Clustering and Graphical Representation of Its Result., 2002, 15,pp. 309-316.

12. A. Takeuchi; T. Saito; H. Yadohisa Asymmetric agglomerative hierarchical clustering algorithms and their evaluations., 2007, 24,pp. 123-143. DOI: https://doi.org/10.1007/s00357-007-0002-1.

13. D. Olszewski Asymmetric k-means algorithm., Springer: Heidelberg, Germany, 2011, Volume 6594,pp. 1-10.

14. D. Olszewski K-means clustering of asymmetric data., Springer: Berlin/Heidelberg, Germany, 2012, Volume 7208,pp. 243-254.

15. D. Olszewski; B. Ster Asymmetric clustering using the Alpha-Beta divergence., 2014, 47,pp. 2031-2041. DOI: https://doi.org/10.1016/j.patcog.2013.11.019.

16. D. Vicari Classification of asymmetric proximity data., 2014, 31,pp. 386-420. DOI: https://doi.org/10.1007/s00357-014-9159-6.

17. D. Vicari Modeling Asymmetric Exchanges Between Clusters., Springer Nature Singapore: Singapore, 2020,pp. 297-313.

18. D. Vicari CLUSKEXT: CLUstering model for SKew-symmetric data including EXTernal information., 2018, 12,pp. 43-64. DOI: https://doi.org/10.1007/s11634-015-0203-0.

19. D. Vicari; C. Di Nuzzo A between-cluster approach for clustering skew-symmetric data., 2024, 12,pp. 163-192. DOI: https://doi.org/10.1007/s11634-023-00566-2.

20. A. Chaturvedi; J.D. Carroll An alternating combinatorial optimization approach to fitting the INDCLUS and Generalized INDCLUS models., 1994, 11,pp. 155-170. DOI: https://doi.org/10.1007/BF01195676.

21. J.D. Carroll; P. Arabie INDCLUS: An individual differences generalization of ADCLUS model and the MAPCLUS algorithm., 1983, 48,pp. 157-169. DOI: https://doi.org/10.1007/BF02294012.

22. J.C. Gower The analysis of asymmetry and orthogonality., North Holland: Amsterdam, The Netherlands, 1977,pp. 109-123.

23. R.P. McDonald A simple comprehensive model for the analysis of covariance structures: Some remarks on applications., 1980, 33,pp. 161-183. DOI: https://doi.org/10.1111/j.2044-8317.1980.tb00606.x.

24. C.R. Rao; S. Mitra, Wiley: New York, NY, USA, 1971,

25. OECD, OECD Publishing: Paris, France, 2023,

26. OECD, OECD Publishing: Paris, France, 2019,

27. F. Sharifian Globalisation and developing metacultural competence in learning English as an international language., 2013, 3,p. 7. DOI: https://doi.org/10.1186/2191-5059-3-7.

Figures and Tables

Figure 1: Artificial three-way dissimilarity data. Darker shades of blue represent higher magnitudes. [Please download the PDF to view the image]

Figure 2: Artificial three-way dissimilarity data: symmetric and skew-symmetric components. Darker shades of blue represent higher magnitudes. [Please download the PDF to view the image]

Figure 3: Artificial three-way dissimilarity data: estimated symmetric and skew-symmetric components from the proposed model. [Please download the PDF to view the image]

Figure 4: Mobility data: scree plot of the loss values of (a) Generalized INDCLUS and (b) the proposed model. [Please download the PDF to view the image]

Figure 5: Mobility data: complete and incomplete partition from the proposed model. Different colors indicate different clusters: blue = C[sub.1], green = C[sub.2], violet = C[sub.3], red = C[sub.4], sky blue = C[sub.5], yellow = C[sub.6], and the unassigned countries of the incomplete partition are shown in light colors on the color scale. [Please download the PDF to view the image]

Figure 6: Mobility data: year-specific constants from the proposed model. [Please download the PDF to view the image]

Figure 7: Mobility data: year-specific weights of the complete partition from the proposed model. [Please download the PDF to view the image]

Figure 8: Mobility data: year-specific weights of the incomplete partition from the proposed model (arrow indicates the direction of mobility). [Please download the PDF to view the image]

Table 1: Artificial data: occasion-specific weights and constants from Generalized INDCLUS.

	Cluster	Constant
	Row	C[sub.1][sup.I r]	C[sub.2][sup.I r]	C[sub.3][sup.I r]
	Column	C[sub.1][sup.I c]	C[sub.2][sup.I c]	C[sub.3][sup.I c]
		w[sub.1 h]	w[sub.2 h]	w[sub.3 h]	c[sub.h]
Occasion 1			38.0	18.5	49.6
Occasion 2		25.1			59.7
Occasion 3			22.8	26.8	55.7

Table 2: Artificial data: occasion-specific weights from the proposed model.

	Complete Partition	Incomplete Partition	Constant
	C[sub.1]	C[sub.2]	C[sub.3]	G[sub.1]	G[sub.2]	G[sub.3]
	r ^[sub.1 h]	r ^[sub.2 h]	r ^[sub.3 h]	t ^[sub.1 h]	t ^[sub.2 h]	t ^[sub.3 h]	b ^[sub.h]
Occasion 1	8.8	-6.0	5.8	16.2	-15.3	-1.9	31.8
Occasion 2	6.9	-4.4	9.7	-13.2	12.3	1.8	28.4
Occasion 3	6.4	14.5	-1.6	14.1	-12.5	-3.4	19.7

Table 3: Mobility data: row and column clusters from Generalized INDCLUS.

Row Cluster	Column Cluster
C[sub.1][sup.I r]	Austria, Belgium, Denmark, Greece, Ireland, Italy, Luxembourg, Norway, Portugal, Spain, Switzerland, United States	C[sub.1][sup.I c]	Germany, United Kingdom
C[sub.2][sup.I r]	Canada, Denmark, Iceland, Ireland, Norway, Sweden	C[sub.2][sup.I c]	Denmark, United Kingdom, United States
C[sub.3][sup.I r]	Belgium, Canada, France, Germany, Greece, Ireland, Italy, Netherlands, Portugal, Spain, Sweden, Switzerland, Turkey, United Kingdom, United States	C[sub.3][sup.I c]	France, Netherlands, United Kingdom, United States
C[sub.4][sup.I r]	Austria, France, Germany, Luxembourg, Netherlands, Switzerland	C[sub.4][sup.I c]	Austria, Belgium, Germany, Switzerland
C[sub.5][sup.I r]	Austria, Canada, Turkey, United States	C[sub.5][sup.I c]	Germany, United States

Table 4: Mobility data: occasion-specific weights and constants from Generalized INDCLUS.

	Cluster	Constant
	Row	C[sub.1][sup.I r]	C[sub.2][sup.I r]	C[sub.3][sup.I r]	C[sub.4][sup.I r]	C[sub.5][sup.I r]
	Column	C[sub.1][sup.I c]	C[sub.2][sup.I c]	C[sub.3][sup.I c]	C[sub.4][sup.I c]	C[sub.5][sup.I c]
Year		w[sub.1 h]	w[sub.2 h]	w[sub.3 h]	w[sub.4 h]	w[sub.5 h]	c[sub.h]
2016		15.398	17.778	9.298	9.852	20.682	1.145
2017		15.145	17.496	9.357	9.689	20.175	1.185
2018		16.961	16.366	8.234	9.545	21.392	1.268
2019		18.168	15.490	7.929	9.836	21.011	1.268
2020		17.361	15.808	8.010	9.859	21.213	1.288

Table 5: Mobility data: complete and incomplete partition from the proposed model. Italics indicate unassigned countries in the incomplete partition.

Complete Partition	Incomplete Partition
C[sub.1]	Germany, United States	G[sub.1]	Germany, United States
C[sub.2]	Greece, Ireland, Turkey	G[sub.2]	Greece, Ireland, Turkey
C[sub.3]	Denmark, Iceland, Norway, Sweden	G[sub.3]	Iceland, Norway, Sweden
C[sub.4]	Belgium, France, Luxembourg, Netherlands, Portugal, Spain	G[sub.4]	Luxembourg, Portugal
C[sub.5]	Austria, Canada, Italy, Switzerland	G[sub.5]	Canada, Italy, Switzerland
C[sub.6]	United Kingdom	G[sub.6]	United Kingdom

Table 6: Mobility data: year-specific weights for the complete partition and constants from the proposed model.

	Complete Partition	Constant
	C[sub.1]	C[sub.2]	C[sub.3]	C[sub.4]	C[sub.5]	C[sub.6]
Year	r ^[sub.1 h]	r ^[sub.2 h]	r ^[sub.3 h]	r ^[sub.4 h]	r ^[sub.5 h]	r ^[sub.6 h]	b ^[sub.h]
2016	-5.653	2.757	3.676	2.913	1.273	-10.221	93.135
2017	-5.483	2.684	3.655	2.818	1.267	-10.069	93.158
2018	-6.232	2.496	3.519	2.602	1.074	-10.192	93.570
2019	-5.473	2.447	3.478	2.518	0.987	-10.892	93.589
2020	-6.122	2.461	3.534	2.453	1.174	-10.087	93.575

Table 7: Mobility data: year-specific weights of the incomplete partition from the proposed model.

	Incomplete Partition
	G[sub.1]	G[sub.2]	G[sub.3]	G[sub.4]	G[sub.5]	G[sub.6]
Year	t ^[sub.1 h]	t ^[sub.2 h]	t ^[sub.3 h]	t ^[sub.4 h]	t ^[sub.5 h]	t ^[sub.6 h]
2016	5.158	-2.568	-1.772	-1.836	-1.223	10.048
2017	5.051	-2.469	-1.770	-1.842	-1.211	9.932
2018	5.467	-2.497	-1.743	-2.000	-1.308	9.711
2019	4.814	-2.413	-1.738	-2.009	-1.208	10.466
2020	5.470	-2.403	-1.755	-2.087	-1.377	9.839

Author Affiliation(s):

[1] Department of Social and Economic Sciences, Sapienza University of Rome, P.le Aldo Moro 5, 00185 Rome, Italy; [emailprotected]

[2] Department of Statistical Sciences, Sapienza University of Rome, P.le Aldo Moro 5, 00185 Rome, Italy

Author Note(s):

[*] Correspondence: [emailprotected]

DOI: 10.3390/sym16060752

COPYRIGHT 2024 MDPI AG
No portion of this article can be reproduced without the express written permission from the copyright holder.