Bridging Trustworthiness and Open-World Learning: An Exploratory Neural Approach for Enhancing Interpretability, Generalization, and Robustness

As researchers strive to narrow the gap between machine intelligence and human through the development of artificial intelligence multimedia technologies, it is imperative that we recognize the critical importance of trustworthiness in open-world, which has become ubiquitous in all aspects of daily life for everyone. However, several challenges may create a crisis of trust in current open-world artificial multimedia systems that need to be bridged: 1) Insufficient explanation of predictive results; 2) Inadequate generalization for learning models; 3) Poor adaptability to uncertain environments. Consequently, we explore a neural program to bridge trustworthiness and open-world learning, extending from single-modal to multi-modal scenarios for readers.1) To enhance design-level interpretability, we first customize trustworthy networks with specific physical meanings; 2) We then design environmental well-being task-interfaces via flexible learning regularizers for improving the generalization of trustworthy learning; 3) We propose to increase the robustness of trustworthy learning by integrating open-world recognition losses with agent mechanisms. Eventually, we enhance various trustworthy properties through the establishment of design-level explainability, environmental well-being task-interfaces and open-world recognition programs. As a result, these designed open-world protocols are applicable across a wide range of surroundings, under open-world multimedia recognition scenarios with significant performance improvements observed.


INTRODUCTION
Contemporary artificial intelligence (AI) continues to furnish benefits to real-society from economic and environmental perspectives, among others [12,33]. As AI gradually penetrates into high-risk fields such as healthcare, finance and medicine, which are closely related to human attributes, there is growing consensus awareness that people urgently expect these AI solutions to be trustworthy [8,16]. For instance, lenders expect the system to provide credible explanations for rejecting their applications; engineers wish to develop common system interfaces to adapt to wider environments; businesspeople desire that the system can still operate effectively under various complex conditions, among other expectations. The rapid proliferation of these AI solutions has resulted in a crisis of trust, as the destruction of the trust of co-interest holders may have serious social consequences. Such crises can include: 1) Prediction is difficult to understand; 2) Poor generalization ability; 3) Sensitivity to abnormal environments. There can be summarized as the trustworthiness crisis of AI systems in terms of interpretability [21,43,47], generalization [20,46,48] and robustness [5,19,30]. In stark contrast, professional AI developers have conventionally emphasized significant model performance (e.g., accuracy) as the ultimate criterion for their workflow. From the perspective of ordinary and non-specialized AI beneficiaries, this metric is far from convincingly reflecting the trustworthiness of these AI systems. To this end, it is imminent to construct trustworthy AI systems that go beyond performance, considering but not limited to the properties shown in Figure 1 to alleviate the trustworthiness crisis.
Against this backdrop, the exploration of trustworthy learning towards open-world has emerged as a fashionable tendency among researchers as one of the significant implementation methods for AI systems. Henceforth, enormous amounts of developers have been working on concluding a general solution for constructing trustworthy frameworks [19,24,36] (kindly refer to Section 1 of the Appendix for more related work on trustworthiness). Concomitantly, the development of trustworthy learning also encounters the same three challenges as trustworthy AI that need to be bridged: 1) The dilemma of inadequate interpretability arising from network opaqueness properties; 2) The problem of deficient generalization attributed to restricted model cognitive capabilities; 3) The limitation of insufficient robustness caused by various unknown data instances. Consequently, it is hardpressed to construct a trustworthy open-world learning for boosting the development of AI.
To alleviate the limitations and drawbacks mentioned above, this paper bridges trustworthiness and open-world learning for formally defining a family of design neural programs of trustworthy learning. In order to achieve the purpose of this paper, we mainly fulfill it through the following steps: 1) Derived from a unified class of optimization problem-derived technologies, objective functions with specific physical meanings facilitate to construct trustworthy networks with the design-level interpretability; 2) For the pursuit of increasing the generalization of trustworthy learning, the environmental well-being task-interfaces boost the perception capabilities of models with flexible representation learning regularizers (RLR), demand-driven regularizers (DDR) and graph-topological regularizers (GTR); 3) Meticulously-designed open-world recognition losses and agent selection mechanisms assist the models in handling unknown or out-of-distribution inputs effortlessly, thus promoting the robustness of trustworthy learning. Naturally, we maintain the above-mentioned trustworthiness properties to a more generalized multi-modal scenario. This is accomplished by proposing a comprehensive generalizeable trustworthy protocol capacity of enhancing trustworthy properties in an operational setting. The overall framework is demonstrated in Figure 2. Accordingly, we expect to furnish the readers with enlightments into constructing such a congener trustworthy framework. In a nutshell, the main contributions of this paper can be listed as follows: • Bridge trustworthiness and open-world learning by exploring a family of neural approaches for enhancing interpretability, generalization, and robustness. To the best of our knowledge, this is the first work on bridging trustworthiness and open-world learning, and enhancing trustworthy properties. • Establish interlinkages of trustworthy properties with the design-level explainability, environmental well-being taskinterfaces and open-world recognition programs. • Provide guideline in an operational setting for these fullinterested developers and expedite correlative researches on these essential trustworthy problems and solutions.

THE PROPOSED FRAMEWORK
In this section, we describe the procedure for designing trustworthy learning frameworks with enhancing trustworthiness properties. From a single-modal perspective, we would explain how to increase the interpretability by optimization-inspired technologies at the design-level, rather than post-hoc explainability. Afterwards, well-being task-interfaces with flexible learning regularizers boost the models empower their perception capabilities to hold their model-generalization. Following closely, open-world recognition programs provide robustness for frameworks with handling outof-distribution inputs effortlessly. Grounded on this, by revisiting the above methods, more all-encompassing multi-modal scenario

Machine Optimization-based Design-level Interpretability
In pursuit of achieving the design-level interpretability, it is necessary to introduce some general terms with optimization that have a physical meaning, which prompts us to reassess fundamental physical concepts and their meanings. Normally, upon receiving real-world data with various types of information, it is customary for it to possess high dimensionality. Thus, it is rational to extract significant characteristics of the relationship among the data. For example, in signal processing, we can leverage this ability to acquire knowledge of the signal and thus reconstruct the original signal to the maximum extent possible. In physics, particularly in quantum mechanics and condensed matter physics, this perspective can be interpreted as a measurement of the energy or cost associated with transforming the matrix into the change of basis or a quantum mechanical evolution [28]. Therefore, it is critical to consider it as a cornerstone in the explainable network design, and the above optimization scenario can be summarized as where (x) : R → R means a matrix approximation term that has a physical meaning and can encompass various interpretable application scenarios, among others, such as ∥X − ZD∥ 2 (latent representation learning [13]), ∥X − XZ∥ 2 (subspace learning [35], Z = [Z ] × ) and so forth. However, as previously noted, due to the numerous dimensions involved in machine decision-making, this can result in an unsatisfactory generalization ability of the model when inspired by Problem (1). Consequently, it becomes imperative to introduce a learning term that serves with the physical significance of balancing the model complexity with its ability to generalize to new data, especially for unknown data that requires decision-making. Specifically, Problem (1) can be rewritten as where (z) : R → R can encourage the model to simplify the open-world representation of its input data by focusing only on the most relevant features and ignoring noise or irrelevant information, such as sparse-norm, nuclear-norm, and their derivatives. Typically, a possible tighter surrogate of the objective function in (2) is to guarantee (x) and relaxes (z) only, providing a possible solution. Therefore, the subdifferential, proximal operator and Moreau envelope of a proper convex function (x) [3] are defined as where prox(·) is a proximal operation > ( ), ( ) is the Lipschitz constant of (·), and E is the Euclidean space. Actually, the sparse-norm, nuclear-norm, and their derivatives all necessitate proximal operations, although the procedures executed to satisfy their constraints are distinct. Therefore, with the help of demanddriven regularizer (3), we can integrate a generalized solution to Problem (2) with physical implications, equivalently where ∇ (·) is the gradient of Problem (1), and is the -th iteration. The demand-driven regularizer (4) extends some prior techniques [3,23] and its physical significance lies in its capacity to identify the solution that simultaneously satisfies Problem (2). Further, we can formalize the interpretable machine optimization equation Machine iterative equation with interpretability Machine optimization-based Problem (2) and regularizer (4) provide a solid foundation (5) for moving towards network design-level interpretability. Furthermore, we will advance towards constructing trustworthy frameworks that are explainable grounded on these machine optimization techniques.

Step Forward: Generalization Environmental Well-being Task-interfaces
With the aid of these interpretability tools that are imbued with physical significance, we will now proceed towards deep networks. Currently, the groundwork for interpretability has been established in (5), but it is limited to the optimized architecture level. In another word, there exist notable obstacles that impede our progress towards deep networks, particularly the fact that current schemes lack the same implicit optimization parameters in (5) as deep networks. In addition, in machine optimization learning, a long-standing issue is manually tuning hyper-parameters such as the coefficient in front of (·), which considerably hinders the working efficiency. To overcome these limitations and advance towards deep networks, we utilize measures such as adding learnable variables by deduction and substitution in the demand-driven regularizer (4). Moreover, we also enable these frameworks to automatically search for an appropriate balance value by parameter self-learning, expressed as Step forward: Demand-driven regularizer network layers (6) where P in (6) is reparameterized with prox (·), which facilitates the efficient automatic search of task-specific hyper-parameter , and Θ z is a learnable parameter. Utilizing the network layer (6), we transition from machine optimization to constructing a generalized  demand-driven regularizer-centered task-interface while preserving interpretability (as outlined in Subsection 2.1). Although (6) is applicable to structured data, it is not as workable for unstructured ones, such as social networks data. To address this, we introduce a graph-topological regularizer term that provides a more versatile task-interface as More generalized: Graph-topological regularizer network layers (7) where the graph-topological regularizer (·) encourages the model to learn a smoothness and continuous open-world representation z. Herein, Equation (7) is associated with problem ℎ (x) := (x) + (z) + ℎ(z). Its physically considers the graph-topological structure of the data, and also has good interpretability. Thus far, we have utilized the design-level interpretability provided by machine optimization to shift towards deep networks and further established two generalization environmental well-being task-interfaces (6)- (7) to enhance these trustworthy properties.

Towards Open-world Robustness
In the practical realm, models are required not only to handle familiar data but also to operate in an open-world setting, where known and unknown data are typically intertwined for decision-making. Indeed, we derive the representation using Frameworks (6)- (7) in an open-world environment that encompasses two components: representation for known and unknown data. To enhance the robustness of the previous trustworthy frameworks and adapt them for open-world scenarios, we have devised the ensuing losses where Z and Z denote one-hot coding of ground-truths and pseudo labels of unlabeled samples through (Z ), respectively.Ẑ indicates the set probability of known and unknown data that the -th representative sample belongs to class , and 1 and 2 are two trade-off balance parameters. It should be noted that we do not minimize loss L (removing minus sign), as our objective is to attain recognition by maximizing the uncertainty of the unknown classes. Furthermore, we rank the normalized open-world representations and discard samples whose ranking values fall within the bottom and top 10%, respectively. This is because large values are favorable for recognition, while small values indicate a balanced output across each known class, which is more conducive to enhancing the model robustness. Additionally, to improve the model capability for open-world recognition, we incorporate an agent into the perception to aid in identifying whether a sample belongs to a known or an unknown class aŝ where ( | ) is obtained from the softmax output of representations, andˆis the prediction label. If the value of the present sample is below the agent , we reject it as an instance of an unknown class; otherwise, we designate the predicted class as the one with the highest probability [42].

Revisiting Overall Trustworthy Frameworks and Derived Examples
When revisiting the proposed trustworthy frameworks, we observe that the output of the design-level interpretability networks is exactly the minimizer of an objective function during model training. This is particularly interesting because it aligns with the concept of bi-level optimization [6] protocols, which can be formalized as min L (z * (x, ), y), where the lower-level regularizer-centered networks are designed to obtain a meaningful representation z * , and y is the ground-truths, while the upper-level open-world training losses are incorporated to enhance the awareness of task representations by optimizing the model parameter .
Under the supervision of the above protocols, we reorganize the exploration and design process of the entire proposed trustworthy framework. Initially, we confer interpretability to the trustworthy frameworks at the design-level grounded on open-world data physics principles and machine optimization. Subsequently, with the assistance of machine-optimized regulations, we progress towards deep networks and establish generalization for dependable frameworks at the task-level. Concurrently, the model capability  (6), several examples can be derived, as outlined in Table 1 of the Appendix, which can be generalized under Framework (6) as follows where P means a generalized demand-driven regularizer, (·, ·) denotes a residual term, and F, U are learnable parameters. Instantiated generalized equation (13) results in residual-guided demanddriven networks, which consist of a representation learning term, a data residual term, and a demand-driven activation function.

Graph-topological Regularizer-centered Network Layers.
While the above framework meets graph-topological regularizers, it can collide with Framework (7). Several examples can also be derived as outlined in Table 1 of the Appendix, which can be generalized under Framework (7) as where (L)Z ( ) W gives a generalized graph item, means a contraction factor, and W is a learnable parameter. Instantiated equation (14) results in graph-topological demand-driven layers, which consist of a representation learning term, a graph regularizer term, a data residual term, and a demand-driven function.

Beyond Single-modal: Trustworthy Multi-modal Framework From a Universal Perspective
Notations. Suppose there are modalities data, denoted as {X } =1 , which include mixed known and unknown data in the -th modality. The similarity among samples in the -th modality is expressed by

and a multi-modal open-world co-latent representation is denoted by Z.
Likewise, it is imperative to model the problem in multi-modal scenarios like (2), and then use machine optimization rules to expand it. In this regard, we can directly extend Framework (6) to a multi-modal demand-driven task-interface by learning the multimodal open-world co-latent representation z as Multi-modal demand-driven regularizer network layers (15) where F denotes a generalized fusion, which can be weighted average V, auto-weight fusion W, attention mechanism A [31] and trusted fusion T [15], whereas Θ z denotes the -th learnable parameter. Similarly, Framework (7) can also be expanded to a multi-modal graph-topological task-interface as Multi-modal graph-topological regularizer network layers (16) Herein, we have transitioned from Frameworks (6)-(7) to (15)- (16), resulting in a unversial multi-modal trustworthy framework with a single-modal foundation, significantly enhancing the user experience. Furthermore, the above frameworks enhance design-level interpretability in multi-modal scenarios, while also increases generalization and robustness. Here, we can also formalize the above opinion into the following multi-modal bi-level optimization framework [6] as where the upper-level open-world training losses and the lowerlevel multi-modal regularizer-centered networks together constitute the above frameworks, as shown in Protocol (Algorithm) 2 of the Appendix, and several examples are provided below.

Multi-modal Demand-driven Regularizer-centered Network
Layers. In multi-modal scenarios, we can generalize some examples in Table 3 of the Appendix under Framework (15) as follows Instantiated multi-modal network layers in generalized Framework (15) (18) where network layer (18) is the multi-modal version of (13). In this case, the obtained representation group Z ( ) needs to be generalized fusion, which is denoted as follow where Θ F is a learnable parameter in generalized fusion.

Multi-modal Graph-topological Regularizer-centered Network
Layers. When we consider multi-modal graph-topological regularizercentered situations, examples in Table 3 of the Appendix can also be included under Framework (16), as shown below Instantiated multi-modal network layers in generalized Framework (16) (20) where network layer (20) is the multi-modal version of (14), and a generalized fusion is also required. Overall, the transition from a single-modal trustworthy framework to multi-modal ones is a significant stride forward in the advancement of more intricate and effective AI systems.

Discussions and Insights
2.6.1 Relationships with Previous Works. In this subsection, we explore the possibility that some previous excellent works can be incorporated into the proposed frameworks, while maintain these trustworthy properties.
Connection with Implicit Networks: As proposed in excellent works such as [25,26], various implicit networks define a fixed-point equation as an implicit layer for aggregation, thereby generating the equilibrium representation. In this context, the proposed frameworks can harness the following transformations to build connections to implicit networks as where Fix(B (x, )) is a fixed-point equation, which can be derived from the proposed framework. For example, the implicit framework in [25] is a special case of generalized network layer (14) (F is equal to zero and the minus sign before (·, ·) is placed in learnable parameters). Consequently, these methods can be seamlessly integrated into the proposed frameworks ((6)- (7) to (15)-(16)) to create various types of trustworthy learning methods that maintain trustworthiness and enhance their diverse theorems and properties.
Connection with Graph Convolutional Networks and Variants: Leveraging the insights from some outstanding interpretable graph convolutional networks [38,49], the proposed trustworthy methods can broadly encompass some existing GCN approaches, as depicted in Table 2 of the Appendix. Specifically, some of its variants (such as the connection with hypergraph neural network in Table 3 of the Appendix) can also be crafted and derived into the proposed frameworks. The proposed trustworthy frameworks not only enhance interpretability as these works, but also further increase generalizability (generalized task-interfaces) and robustness (perception of unknown things) in open-world scenarios.
Connection with Other Prior Networks: The above-mentioned network layers can be further extended to multi-modal/view scenarios. In addition, different from [34,37], using traditional multivariate optimization methods such as alternating direction method of multipliers, the proposed frameworks can also be extended to multivariate trustworthy network layers, where each subproblem can constitute a subnetwork. Then, we can construct the corresponding trustworthy frameworks according to Protocols 1 or 2. It should be noted that most of the networks in Tables 1-3 of the Appendix are firstly proposed and derived from our frameworks, which also reflects the versatility and universality.
Indeed, the aforementioned outstanding works (such as [38,49]) may also explore a broader framework, but they all investigate a single property within a trustworthy framework, such as interpretability. In contrast, this paper aims to enhance trustworthy properties and integrate them into a more comprehensive framework to bridge trustworthiness and open-world learning.
2.6.2 Insight Remarks. Given the extensive research fields involved in each trustworthy property, we aim to provide readers with some insights in this regard through the superficial exploration of the frameworks presented in this paper. Therefore, we provide the following observations on the proposed trustworthy frameworks.

Remark 1: Trustworthy Reclaim.
• Interpretability: The proposed trustworthiness protocol employs machine optimization rules to guide the development of deep networks that prioritize design-level interpretability, resulting in models that have physical meanings and are more easily understood by users. • Generalization: By incorporating representation learning, demand-driven and graph-topological regularizers into the design-level interpretability of the proposed trustworthy protocols, it can offer more adaptability and better support for generalized well-being interfaces and downstream tasks. Remark 2: Contribution Reclaim.
Note that there are some of the methods that we utilize are existing works, such as [13,15,31,42], the combination of the proposed frameworks with these methods leads to a more generalized framework to alleviate the challenges of trustworthy learning.

Theoretical Analysis
Convergence: As a vital property of trustworthy learning, to prove their convergence, we insert the following Theorem 1. Theorem 1. Given the bounded damping factor ∈ [0, 1), the proposed networks for propagation (such as Tables 1-3 of the Appendix) is a contraction mapping and the unique convergence solution Z * can be obtained by the proposed frameworks.
This can be proved by using the properties of matrix vectorization and the Kronecker product with the Banach fixed Point Theorem. Please refer to Subsection 2.5 of the Appendix for proof. About the proposed theorem, we have the following conclusions: 1) The models derived from the proposed framework can achieve optimal values on sufficient training rounds; 2) At least the models exported in this work can ensure convergence.
3) The convergence analysis in the experiment also verified this point.
Complexity: The complexity of single-modal and multi-modal trustworthy learning frameworks requires O ( 2 ) and O ( 2 ) for each epoch forward/backward propagation, respectively.

EXPERIMENTS
In order to put forward the proposal in this paper, several example networks were implemented using the proposed frameworks as the backbone to verify the rationality. The following subsection provides a quantitative and qualitative analysis of the experimental results, including performance and trustworthiness. Note that 10% of the ground-truth labels are used for L training, while the pseudo labels generated by unlabeled samples are used for L training. For a more specific introduction of experimental setups, including  dataset details, compared methods and parameter settings, kindly refer to Section 3 of the Appendix.

Experimental Results
Since performance serves as a foundation for trustworthiness, we conducted experiments on both single-modal datasets and more complex multi-modal datasets for evaluation. The experimental results and analysis are as follows.
Open-world Semi-supervised Node Classification. Table 1 records the semi-supervised node classification accuracy results under 10% ratio of labeled nodes with different unknown classes, and we have obtained the following observations: • The derived methods from the proposed single-modal trustworthy frameworks have achieved superior results on most datasets, including baselines and their variants. • These methods show good performance compared to four non-GCN-based approaches among these datasets. This indicates that the proposed protocol exported methods can effectively improve the expression ability from feature-level. • The proposed frameworks demonstrate superior performance compared to four GCN and three GNN variants across most datasets. This can be attributed to the fact that we not only consider interpretable graph-topological dissemination structures, but also incorporate demand-driven representation learning, enabling the capture of crucial node features. • Overall, the well-designed trustworthy frameworks have enabled the examples to outperform competitors on openworld tasks while also ensured trustworthy results. Due to the transparent network design and the broad task-interface in the framework protocols, which also exhibit some ability to perceive unknown entities.
Open-world Multi-modal Semi-supervised Classification. Table 2 displays the multi-modal semi-supervised classification accuracy results under 10% ratio of labeled samples with different unknown classes, and we found results in the following aspects: • Following the trend of single-modal tasks, the proposed methods can still achieve good performance on multi-modal tasks compared to other multi-modal extension methods. • Compared with non-GNN-based, GCN, and GNN-based multimodal methods, the proposed networks in frameworks can achieve good performance and are compatible with complex data processing capabilities. The expressive power of multimodal models has also been indirectly improved through the designed components. • The results indicate that the proposed methods can maintain trustworthiness while achieving excellent performance in complex development open-world scenarios thanks to the designed trustworthy learning frameworks.

Trustworthiness Study
• Interpretability of the Proposed Frameworks: Tables 6-7 of the Appendix indicate that adding graph-topologial terms (including hypergraphs) to the trustworthy frameworks leads to better performance in most single-modal and multi-modal learning task cases. In addition, we visually display some instances in Figure 5 (for more detailed information, refer to Figures 1-2 of the Appendix), which present that the graph-topological frameworks perform better in recognizing unknown samples. These figures provide an intuitive illustration of the effectiveness of considering the graph topology structure in improving model performance with stronger recognition ability for unknown instances, thereby demonstrating the post-hoc-level interpretability of the proposed frameworks. • Generalization of the Proposed Frameworks: The previous content has already presented the generalization of the proposed frameworks in designing networks, regularizers and downstream tasks. Next, we will reveal the generalization of fusion in trustworthy multi-modal frameworks. Figure 3 proclaims that trusted fusion can achieve favorable fusion results in most situations through dynamically alleviating the impact of uncertainty caused by data heterogeneity, thus achieving positive results after fusion procedures (for more detailed information, refer to Figures 3-4 of the Appendix). This is an inspiring concept: When constructing multi-modal trusted networks, we could utilize such fusion methods to promote the trustworthiness of the multi-modal open-world co-latent representation. • Robustness of the Proposed Frameworks: In addition to design robustness to open environments, model robustness is also presented here. The influence of hyper-parameters 1 and 2 of open-world training losses can be observed in Figures 5-8 of the Appendix. These figures depict the impact of tuning these parameters within the ranges of [0.001, 0.01, · · · , 100] on frameworks. The results present that the proposed frameworks demonstrate stable performance across most values, indicating their robustness. However, when 2 is set to a small value, it significantly affects the performance, indicating the crucial role played by the unknown loss in the generalization of the entire model.

Parameter Sensitivity
• Impact of Layers: Figure 7 (a) depicts the impact of the number of layers on performance (for more detailed information, refer to Figure 9 of the Appendix). Generally, as the number of layers increases, performance stabilizes after less than four layers, and in some situations, performance

CONCLUSION AND FUTURE WORK
In this paper, we proposed a novel perspective on providing a family of design neural approaches for bridging trustworthiness and openworld learning exploration, which included the accomplishment of enhancing various trustworthy properties. We explosively enhanced trustworthiness, including interpretability, generalization, and robustness by the establishment of design-level explainability, environmental well-being task-interfaces and open-world recognition programs. By following the proposed trustworthy open-world protocols, we could develop methods that performed well across a wide range of applications while maintaining trustworthiness. Extensive experiments across fields involved in each trustworthy property have demonstrate that our exploration for the open-world trustworthy frameworks presented in this paper could provide readers with more valuable insights. In future, we will consider how to bridge trusted and open-world under unsupervised scenarios.

RELATED WORK ON TRUSTWORTHY LEARNING
Hence, we have presented the relevance of trustworthiness in AI. Immediately, we will comprehensively review the exploration on serval properties of trustworthy learning. Definition of Trustworthy Learning: Trustworthy learning attempt to construct a framework that can guarantee both task performance and human-environment-friendly trustworthiness [19].
Interpretability: Interpretability of trustworthy learning means that the systems should be transparent for human, that is, it is necessary for human that we need to know the process how a model learns patterns and makes decisions [2]. All in all, the concerned research classify this conception into two levels: design-level [21,43,47] (this paper also tries to pursue) and post-hoc-level [1,22,27] interpretability.
Generalization: Generalization of trustworthy learning implies that the model general and cognitive capabilities covering more situations [20,46,48], including limited data, complex environments, domain shift and etc.
Robustness: Robustness of trustworthy learning refers that the ability of the systems to handle open-world environments or various unknown data instances [5,19,30].
Limitations, Solutions and Beyond: Although they have done loads of excellent research on various properties of trustworthy learning, there are still few attempts to consider these properties into a universal framework. It is more laborious and challenging to explore effective architectures that could perform better and generalize well to trustworthy learning. To this end, we bent over backwards to accomplish this by presenting a comprehensive trustworthy framework protocol capable of embracing more trustworthiness properties in an executive setting. Beyond that, we also promote the integration of trustworthy learning framework from a universal perspective, which is exactly the main contributions of this work. Given the extensive research fields involved in each trustworthy property, we aim to provide readers with some ideas in this regard through the superficial exploration of the trustworthy frameworks presented in this paper.

FRAMEWORK SUPPLEMENTARY
In this section, we provide some supplementary details about the full name, derivation and other details of the example networks implied by the proposed frameworks.

Full Name of the Proposed Abbreviated Network
Here, we give the full name of the proposed abbreviated networks of this paper in Tables 1-3. To provide a unified framework for both single-modal and multimodal learning, the proposed method can incorporate various networks such as S-Net, SG-Net, SL-Net, and SGL-Net, which focus on learning single-modal open-world latent representations using features and graphs for downstream tasks. Additionally, the proposed method can also incorporate networks such as MS-Net, MSG-Net, MSL-Net, MHSL-Net, MSGL-Net, and MHSGL-Net, which are designed for learning co-latent representations based on multi-modal features and graphs for downstream tasks, where the adjacency graph of each modal is constructed by -NN methods.
SL-Net. First, instantiating optimization-inspired objective function under Problem (2) of SL-Net is Subsequently, we utilize machine optimization equation (5) in the main paper to solve the above problem as Here, we link the above updating rules to construct single-modal networks under Framework (7) as where F = I − 1 D D, W = 1 I, U = 1 D , = and H = Prox , and I is an identity matrix. Particularly, for these sparsity and row sparsity parameterized regularizers H = Prox or S = Prog of P instantiation in Table 1 of the main body, we have where is the -th column of Z = [ 1 ; 2 ; · · · ; ; · · · ], and can be activation functions such as ReLU, SeLU, ELU, and etc. The above equation is an example form of the proposed single-modal generalized Framework (7). Other networks derived from Frameworks (6) or (7) in Table 1 can also be derived based on this, and Protocol 1 can be executed for trustworthy learning.  Similarly, machine optimization equation (5) is also used to tackle the above problem as (28) Then, the row sparsity based sub-variate heterogeneous representation network with partial parameterization can be ensured as where F = I− 1 D D , W = 1 I, U = 1 D , = and S = Prog , and I is an identity matrix. When we obtain the sub-modal latent representation, we utilize a generalized fusion to obtain the colatent representation Z as This finishes a complete derivation example of MSL-Net, which learns a co-latent representation. The above equation is also an example form of the proposed multi-modal generalized Framework (16). Other networks derived from Frameworks (15) or (16) in Table 3 can also be derived based on this, and Protocol 2 can be executed for trustworthy learning. The remaining instance single-modal (S-Net, SG-Net, and SGL-Net) and multi-modal (MS-Net, MSG-Net, MHSL-Net, MSGL-Net, and MHSGL-Net) networks implied in Frameworks (6)-(7) to (15)-(16) have similar derivation ideas.

Supplementary of Open-world Losses
It should be noted that we do not minimize loss L , as our objective is to attain recognition by maximizing the uncertainty of the unknown classes. Lastly, we utilize the remaining samples to maximize the loss. Through training with loss function (10), we aim to increase the discriminatory power of the recognized classes by labeling the data, while simultaneously maximizing the uncertainty loss to achieve a more balanced output for each sample, which assists in detecting unknown classes.

Supplementary of Agent Selection
The question that arises is how to determine the agent. To address this, we utilize the validation set for agent selection. Similarly, for the samples in the validation set, we perform the same process as in training. So we calculate the largest probability of all samples and average them to obtain . We still select the top 10% samples with the highest entropy as the expected unknown class samples, and their average probability is denoted by . Finally, we obtain the final agent value by averaging the two, i.e., = ( + )/2. Up to the present, the open-world losses and agents are employed to perceive unknown data to assist improve the robustness of the overall framework.

Theoretical Analysis
Proof of Theorem 1. For any matrix B ∈ R 1 × 2 , we define the vectorization of the matrix by [B] and the Frobenius norm of the matrix by ∥B∥ . Let us first state some preliminary explanation

SUPPLEMENTARY OF EXPERIMENTS
In this section, we present some supplementary, including dataset details, compared methods, evaluation indicators, implementation details, parameter settings and other experimental results.  Table  4 and below. • Chameleon is Wikipedia networks where nodes represent web pages from Wikipedia and edges indicate mutual links between pages. Node feature vectors are bag-of-word representations of informative nouns in the corresponding pages. Each node is labeled with one of five classes according to the number of average monthly traffic of the web page. • CoraFull is large well-known citation networks where nodes mean documents, and edges are citation links. • Film is the actor-only induced subgraph of the film-directoractor-writer network. Each nodes correspond to an actor, and the edge between two nodes denotes co-occurrence on the same Wikipedia page. Node features correspond to some keywords in the Wikipedia pages. We classify the nodes into five categories in term of words of actor's Wikipedia. • Pubmed is a citation network of articles related to diabetes from the PubMed database where node attributes are frequency-weighted word frequencies and the labels specify the type of addressed diabetes. • Cornell, Tesax, Wisconsin come from the WebKB dataset.

Details of Experimental Setup
Nodes represent web pages and edges denote hyperlinks between them. Node feature vectors are bag-of-word representations of the corresponding web pages. Each node is labeled as student, project, course, staff, or faculty. • UAI has been utilized in graph convolutional networks for community detection.
For multi-modal experiments, we employ eight publicly available datasets to simulate a multi-modal open-world scenario. Datasets include Caltech101 1 , Hdigit 2 , MITIndoor 3 , MNIST 4 , NoisyMNIST 5 , NUS-WIDE 6 , Scene15 7 and Youtube 8 . Details for all of these eight datasets are presented in Table 5 and below.  For each dataset, a portion of the classes are held out as the unknown class and used for testing, while the remaining classes are used as the known classes for training. The data is split such that 10% of the labeled nodes/samples are used for training, 10% for validation, and 80% for testing. The agent for identify the unknown class is determined using the validation set. The number of unknown classes is varied to evaluate the performance of the models at different proportions of unknown classes. The accuracy (ACC) is used for evaluation performance.
• MLP was a basic forward-structured artificial neural networks, we select it as the baseline. • AE and MAE was an automatic learning representation autoencoder network using encoder and decoder, and MAE is a multi-modal version of AE. • SF-Net introduced a network-based feature selection architecture, which had an attention module for feature generation and a learning module for the problem modeling. • WAST-Net presented a new efficient unsupervised method for feature selection based on sparse autoencoders. • DUA-Net was guided by the uncertainty of data estimated from the generation view, and intrinsic information from many views was integrated to obtain noise-free representations. • TMC-Net provided a paradigm for multi-view learning by dynamically integrating different views at an evidence level. • DSRL-Net proposed a block-wise deep neural network with learnable activation functions for learning data-driven sparse regularizers adaptively in an end-to-end manner. • GCN regularized scalable approach for semi-supervised learning on graph-structured data that was based on a variant of convolutional neural networks. • SGCN desgined a simplifying GCN through successively removing nonlinearities and collapsing weight matrices between consecutive layers. • FAGCN jointly conducted a GCN with the self-gating mechanism, which could adaptively integrate different signals in the process of message passing. • GAT operated on graph-structured data leveraging masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions. • GNN-LF/GNN-HF proposed a unified objective optimization framework for graph neural networks with a feature fitting function and a graph term. • HGNN + constructed a high-order multi-modal/multi-type data correlation modeling graph neural networks to learn an optimal representation.
These comparison methods utilize the same settings and losses as the proposed frameworks to test their performance in the openworld, because they have not been tested in an open-world environment. As for the multi-modal comparison methods, we extend the above algorithms to multi-modal scenarios, where each modality utilizes the corresponding networks to learn a sub-modal latent representation, and then uses the four fusion methods employed in this paper to select the optimal one as the most comparative open-world perceived results. Training losses for semi-supervised classification are (10) in the main body.  Tables 1-3, and the training losses are (10) in the main body of paper. The number of training epochs is set as 200, the learning rate is set to 0.001, and the number of layers is set the optimal such as tested in Figures 5-8. The results in Table 5 consider the performance with the best fusion effect. The multi-modal adjacency graphs S are constructed by -NN methods. The construction of Laplacian and hyper-Laplacian matrices L andL are followed by [44].              NoisyMNIST NUS-WIDE Scene15 Youtube Figure 9: Parameter sensitivity analysis of networks derived from the proposed trustworthy frameworks in terms of layer impact on single-modal and multi-modal semi-supervised classification tasks.  Figure 10: Parameter sensitivity analysis of networks derived from the proposed trustworthy frameworks in terms of convergence behavior on single-modal and multi-modal semi-supervised classification tasks.