VMetaFlow: A Meta-Framework for Integrating Visualizations in Coordinated View Applications

The analysis and exploration of complex data sets are common problems in many areas, including scientific and business domains. This need has led to substantial development of the data visualization field. In this paper, we present VMetaFlow, a graphical meta-framework to design interactive and coordinated views applications for data visualization. Our meta-framework is based on data flow diagrams since they have proved their value in simplifying the design of data visualizations. VMetaFlow operates as an abstraction layer that encapsulates and interconnects visualization frameworks in a web-based environment, providing them with interoperability mechanisms. The only requirement is that the visualization framework must be accessible through a JavaScript API. We propose a novel data flow model that allows users to define both interactions between multiple data views and how the data flows between visualization and data processing modules. In contrast with previous data-flow-based frameworks for visualization, we separate the view interactions from data items, broadening the expressiveness of our model and supporting the most common types of multi-view interactions. Our meta-framework allows visualization and data analysis experts to focus their efforts on creating data representations and transformations for their applications, whereas nonexperts can reuse previously developed components to design their applications through a user-friendly interface. We validate our approach through a critical inspection with visualization experts and two case studies. We have carefully selected these case studies to illustrate its capabilities. Finally, we compare our approach with the subset flow model designed for multiple coordinated views.

the visualization framework must be accessible through a 93 JavaScript API. Then, VMetaFlow graphical user interface 94 (GUI) allows users to specify how data flows between views 95 and how these views are coordinated in the same DFD. In con-96 trast to other works, we integrate visualization properties 97 (e.g., data encodings, camera position, axis title, selections) 98 within the DFD and decouple them from the data under 99 analysis. Our approach allows users to model the most com-100 mon types of interactions in juxtaposed and data partitioned 101 multi-view applications (e.g., zoom and pan coordination 102 between two scatter plots -see [2] for further details) and 103 enables fine-grained control over their scope. 104 VMetaFlow was designed for users with different levels 105 of knowledge in data visualization and analysis. Advanced 106 users can extend the functionality of our meta-framework by 107 creating the basic components of the DFD, whereas users 108 without experience in the field of visualization can reuse these 109 components to design applications to analyze their data.

110
Our meta-framework aims to cover two types of visual-111 ization applications: exploratory analysis and fixed work-112 flow applications. Both approaches are essential tasks in the 113 data science field. The former allows users to interact with 114 their data to gain preliminary insights and form hypotheses. 115 Whereas the latter allows them to apply predefined methods 116 to known problems for knowledge extraction. These two tasks 117 benefit from defining the application behavior using a DFD. 118 This feature is essential for exploratory analysis because it 119 allows users to interactively modify the workflow structure 120 to test new hypotheses and to include and test new func-121 tionalities during the prototype development. Furthermore, 122 having access to the DFD helps users understand the data 123 transformation process and enables fast prototyping. 124 We argue that visualization itself is not enough to achieve 125 the objectives of the aforementioned application types 126 because relying only on perception could lead to a misun-127 derstanding of data meaning. Consequently, we claim that 128 it is essential to provide statistical and data processing tools 129 to confirm or reject the user intuitions. Our meta-framework 130 addresses this by means of cards in which the user can encap-131 sulate scripts written in several programming languages, such 132 as R, JavaScript, or Python. 133 The main contribution of this paper is to propose and 134 develop a visual meta-framework and data flow model to 135 integrate third-party visualization frameworks and data 136 processing algorithms, providing them with interoperabil-137 ity mechanisms to design coordinated view applications 138 (see Fig. 1). As mentioned, the only requirement is that the 139 visualization framework must provide JavaScript APIs. Data 140 processing algorithms can be written in several program-141 ming languages: R, JavaScript or Python. Our data model 142 decouples the visualization properties from the data, allowing 143 users to represent both the flow of data and the interactions 144 between coordinated views. Finally, to support the two types 145 of applications mentioned above and users with different 146 levels of expertise in data analysis and visualization, we have 147 taken into account the following requirements: ity. The architectural details of VMetaFlow are shown in 164 Section VII. Section VIII describes the case studies and the 165 external critical inspection carried out to validate our system. 166 Finally, Section IX presents our conclusions. 168 Declarative frameworks for visualization (see Section II-A) 169 are widely adopted general-purpose frameworks for data we find from early approaches, such as SAGE [13], to aca-190 demic prototypes, such as Lyra [14] or iVisDesigner [15], 191 or commercial frameworks with thousands of users, such as  Wilkinson's Grammar of Graphics [17] and HiVE [18] were 205 among the first works to propose the use of declarative 206 specifications for visualization. These approaches provided a 207 high level of abstraction and supported rapid analysis, but did 208 not offer fine control of graphics and interactions [8]. More 209 recent declarative visualization frameworks, such as D3 [4], 210 ggplot2 [5], plotly [6] or Vega [7], have a higher degree of 211 customization, with the drawback of being more complex 212 for users with little or no programming experience. Vega-213 Lite offers a higher degree of abstraction, but still requires 214 a strong background in programming [3]. In our meta-215 framework, visualization experts can create data views using 216 the previously mentioned declarative visualization frame-217 works. Traditionally, a program is generated from a structured 244 sequence of words with a syntactical meaning. Alternatively, 245 visual programs use graphics and two-dimensional layouts as 246 part of the program specification [24]. This approach is easier 247 to understand and work with, as it resembles the human men-248 tal representation of problems. Unlike the one-dimensional 249 textual way, visual programming uses higher-level descrip-250 tions of the program functionality. Users without program-251 ming skills find this approach more accessible [25]. 252 Commonly, visual programming environments use data 253 flow diagrams. These programming environments are based 254 on boxes that encapsulate a piece of functionality and wires 255 that connect them. Data are transformed in the diagram boxes 256 VOLUME 10, 2022 FIGURE 1. Gene Regulatory Network Analysis. This figure introduces an application example used as a first case study. The left side of the figure shows a DFD composed of several cards for loading (1a, 4a, 7a), filtering (5a, 6a) and visualizing data(2a, 3a, 5a, 8a). The force-directed graph filters the data shown in the heatmap and the line chart. We implemented all these data views using different framework. On the other side, a visualization panel with four views was created from the visualization cards defined in the previous DFD. We are using the same numbering for the DFD cards and their corresponding visualization panel. We label cards and visualization with the suffix a and b, respectively. and flows from one execution node to the next one following 257 the diagram wires. Early works that followed this approach  to tackle several problems inherent in using this approach for 294 data analysis and visualization. The authors proposed a data 295 flow system that supports several types of data, visualizations, 296 and processing modules. Moreover, they focus their work 297 on the optimization of the data flow execution. To this end, 298 they introduce a cache to store the intermediate results of 299 each node and a graph analysis step to identify which nodes 300 should be re-executed. The main focus of these systems is the 301 data processing; the visualization is relegated to represent the 302 results of this task. The use of charts for data exploration and 303 queries is severely restricted.

304
Focusing on the exploratory analysis task, ExPlatesJS [36] 305 and VisTrails [37] are worth noting. The former implements 306 a methodology for separating the visual exploration steps. 307 Whereas the latter provides data provenance support. In addi-308 tion to the previously mentioned systems, KNIME [38] is 309 an open-source environment for data science that allows col-310 laboration and workflow reusability. Although VisTrails and 311 KNIME are extensible, all the previous systems mainly rely 312 on their built-in modules and visualizations. Moreover, these 313 systems offer limited support for multiple-view coordina-314 tion. ExPlatesJS and VisTrails do not provide any support to 315 this end. Whereas KNIME follows a publish-subscribe event 316 model; all visualizations can subscribe to filter or selection 317 events and publish them. The scope of their model is limited 318 to composite views, i.e., events are only distributed between 319 visualizations placed in the same visualization window. This 320 model is not extensible to new interactions, and its restricted 321 scope hinders its applicability in the context of exploratory 322 analysis. 323 by default is not self-evident to users from their user study 325 on coordinated and multiple views. Furthermore, Wang Bal-326 donado et al. [40] proposed perceptual cues to display the 327 relationship between views as one of the guidelines when 328 using multiple views. However, none of the previously men-329 tioned approaches offer this type of visual support. In con- suffers from limitations that prevent its use in this case. 346 We will discuss these limitations in Section IX.

375
DFDs naturally divide problems into a sequence of func-376 tional blocks. They are directed graphs that represent how 377 information flows from one node to another. VMetaFlow's 378 cards are the graph nodes, and they encapsulate data pro-379 cessing algorithms and visualizations, whereas connections 380 specify how the data are transmitted between them. In our 381 approach, visualization and data analysis experts focus their 382 efforts on designing and implementing the application's func-383 tional blocks (cards) using the technologies that better fit their 384 needs. Then, the application behavior is described connecting 385 cards in a higher abstraction level graphical interface.

387
Cards are the fundamental component of our model. A card 388 implements a specific functionality, e.g., a scatter plot or a 389 clustering algorithm, using the visualization or data process-390 ing technology chosen by the programmer. From a technical 391 perspective, a card is an encapsulated self-contained module 392 and has everything it needs to carry out its activity. There-393 fore, they can be developed independently. To minimize the 394 integration effort, VMetaFlow's API is constrained to three 395 operations: receive data, send data, and, optionally, change its 396 internal state (properties). Although allowing programmers to 397 store the internal state of the cards is not an essential feature, 398 it improves the expressiveness of the card. For example, 399 it is possible to adapt a cards' behavior to the user selection 400 history. There are two categories of cards:   (1a), transformation of displacement maps to meshes (3a) and dimensionality reduction (5a, 12a, 16a). The data processing cards (3a, 5a, 12a, 16a) are processed on the server. Tabular data are visualized using scatter plots (2a, 6a, 11a, 13a, 17a) and a bar chart (10a). The meshes are visualized using three-dimensional viewers (4a, 9a, 15a, 19a). Data selections created in the scatter plots are used in group creation cards (8a, 14a, 18a). The created groups return to their associated scatter plots and go into the mesh viewers (9a, 15a, 19a, 4a) and the bar chart (10a). The selection combination card (7a) performs an OR operation on the selections coming from the first three scatter plots (2a, 6a, 11a). Finally, the selections from scatter plots (2a), (13a) and (17a) are propagated to the last three-dimensional viewer (4a). On the bottom half, the panel embeds the visualization cards (2a, 13a, 17a, 4a, 10a). We are using the same numbering for the DFD cards and their corresponding visualization panel. Cards are labeled with the suffix a, whereas visualizations use the suffix b.  Some works, such as ExPlates, integrate visualizations 507 into the DFD. This approach simplifies the design and 508 exploratory analysis tasks. When creating multi-view appli-509 cations, designers must consider the cognitive overload of 510 end users [40]. To alleviate visual cluttering, other tools, 511 such as KNIME, separate the multiple-view visualization 512 from the DFD window. Similarly, our GUI separates the data 513 views from the DFD tab using multiple panels. Panels are 514 frames where the results of the DFDs can be visualized. 515 As mentioned in Section I, VMetaFlow aims to cover two 516 types of visualization tasks: exploratory analysis and fixed 517 workflow applications. Data exploration applications benefit 518 from showing the DFD to the final users because it allows 519 them to follow the knowledge extraction process. However, 520 in the static workflow case, users perform a set of actions 521 in a well-known and fixed manner. In this scenario, DFD 522 might distract them from their final goal. The right image in 523 Fig. 1 and the bottom image in Fig. 2 show two panels that 524 arrange visualizations included in their respective DFDs. The 525 number of views that can be shown to the user effectively 526 is limited. With this regard, panels are a way to reduce the 527 number of views shown to the end users at one time. Data 528 views can be grouped based on different criteria, such as a 529 common task or feature. This strategy leaves more room for 530 the visualizations and the DFD in their respective tabs. And it 531 prevents the saturation of the visual channel, enabling users 532 to cope with complex visualizations and covering a broader 533 type of applications. New cards can be added to VMetaFlow through its GUI.

555
New visualizations can be created using any web-based 556 framework, such as plotly or Vega. Our Card Creation Assis-557 tant (see Fig. 4) reduces the integration effort, guiding the 558 design of new cards in 4 steps:

559
Dependencies. In the first step, the card designers select 560 the JavaScript library files. If the desired library is not avail-561 able on the platform, it can be uploaded at this point.

562
Description. In the second step, the card's description 563 and supported connections are defined. The designer has 564 to specify the card's name, identifier, and description. The 565 connection docks can be added in this step by providing their 566 type, name, and description. Additionally, the cardinality of 567 the input docks has to be defined. This parameter establishes 568 the number of incoming connections allowed in a given dock. 569 Lastly, there are default input and output options docks. These 570 connections allow synchronizing options between cards and 571 define the card options in other cards of the DFD.

572
Options. The following step provides a way to create the 573 card's options. Options are a set of parameters that define the 574 card's behavior. For example, in a box plot the axis, titles, 575 range, variables, etc.

576
Code. In this step, the designers have to establish the 577 card behavior. The visualization cards are implemented in 578 JavaScript using the web-based visualization API selected 579 in the first step. They are required to complete the init and 580 update functions. The init function is called once and ini-581 tializes the visualization, whereas update is called when any 582 input connection or state property changes its value. Both 583 functions receive the same set of parameters: (i) the DOM 584 container where the visualization will be embedded, (ii) the 585 Input object that receives all the incoming connections val-586 ues, (iii) the current card internal State, (iv) a DataHandler 587 instance and (v) a callback function (setProperty). The State 588 parameter grants access to the current internal variables and 589 to the output values. In fact, output connections share the 590 selected State variables with other cards. Developers can add 591 or modify the card's internal State and output connection 592 values using the setProperty synchronous callback. Finally, 593 the DataHandler object optimizes the tabular data treatment 594 (for further details, Section VII-B).

595
The pipeline to create processing nodes is similar. The 596 main differences are that data processing scripts only have 597 a function and can be written in R, Python or JavaScript. R 598 and Python scripts run on the server-side, whereas JavaScript 599 cards run on the client-side. In order to enable interoper-600 ability between JavaScript and the other two programming 601 languages, JavaScript objects are transformed to named list 602 in R and to dictionaries in Python. In the same manner, the 603 process function has to be implemented, but it does not have 604 access to a DOM container and the setProperty callback is 605 replaced by setResult. setResult is used to update the card's 606 internal state and to notify the main thread once processing 607 has concluded. Additionally, the setProgress callback can be 608 optionally used to inform about the progress made in the data 609 processing.

610
Besides creating new cards, the card collection can be 611 extended by importing third-party cards. This functionality 612   Structure. The shared data structure is defined. No cod-624 ing skills are needed to define the connection's data 625 structure.

626
Although JavaScript is a loosely typed programming lan-627 guage, the system checks the data structure integrity before 628 sharing it through a connection to prevent errors.

630
Our meta-framework is an additional software layer placed 631 over existing data visualization frameworks and, thus, 632 increasing the use of computational resources. In this section, 633 we describe the architectural details and how the system 634 reduces this overhead. In order to ensure reproducibility, the 635 full implementation of VMetaFlow can be downloaded from 636 https://github.com/VMetaFlow/VMetaFlow. In Section VI-B, we discussed the complete process of inte-698 grating new cards, Fig. 6 shows an example of the coding 699 step of this process. This figure illustrates the minimum 700 JavaScript code required to integrate a basic bar chart defined 701 with Vega-Lite. The first step in the init method is to obtain 702 data from the data handler. Next, we code the visualization 703 in the format dictated by Vega-Lite. For the dimensions of the 704 visualization, we use the size of the container that will display 705 it. In addition, two inputs have been defined in the previous 706 step to allow user interaction with the visualization. The value 707 of these fields is extracted through the state variable and used 708 in the visualization specification. Thus, the users can control 709 which data fields they use for grouping and aggregation. 710 Lastly, we use the Vega-Lite API to produce the bar chart 711 in the container. For the update method, we recreate the 712 visualization by calling the init method.

714
In this case study, we mimic the analysis task defined to 715 validate VisFlow [23] to show our system's flexibility and 716 potential to implement any visualization workflow. This task 717 is common in the genetics domain since it shows the regu-718 lations between the genes, namely which genes activate or 719 repress others. The input data set is a validated regulatory 720 network for Th17 cells [44], that play a central role in the 721 progress of autoimmune diseases and cancer [45]. The appli-722 cation consists of four coordinated views: a gene regulatory 723 network, a heatmap that shows the gene expression matrix, 724 a line chart for the gene expression profile and a data table 725 with further information on each gene. Each view was imple-726 mented with a different framework to show how our system 727 provides interoperability capabilities. The gene regulatory 728 network is a directed weighted graph whose nodes are genes, 729 and the edges are transcription factors. The weights represent 730 the confidence score of regulation. This graph was imple-731 mented using VivaGraphJS [46]. The heatmap was designed 732 with Vega-Lite [3] and illustrates the gene expression matrix, 733 where rows are genes and columns are experimental condi-734 tions. The line chart is a complementary graph that displays 735 the selected gene expression profiles, namely the rows of the 736 gene expression matrix. We used plotly [6] for its implemen-737 tation. Finally, we created the data table with DataTables [47]. 738 The code of most cards was taken from public and private 739 repositories, and most of the development time was employed 740 to integration tasks and testing. The developer estimates to 741 have spent 75 minutes in integration tasks. The card imple-742 mented with VivaGraphJS was the most challenging since it 743 required the development of interface controls.   The analysis starts loading a planar parametrization (dis-767 placement map) of all spines' surfaces [49]. Each displace-768 ment map is a 65 × 65 pixel color image. Unlike 3D meshes, 769 dimensionality reduction can be directly applied to these 770 structures since they have a fixed size. In this study, we want 771 to compare the performance of several dimensionality reduc-772 tion techniques when clustering the planar parameterization 773 of dendritic spines according to their shape. In summary, the 774 process we followed for this task consisted in (i) applying 775 the dimensionality reduction technique to the displacement 776 maps, (ii) visualizing the result using a scatter plot, (iii) cre-777 ating clusters from the data and (iv) analyzing whether all 778 spines in each cluster have the same shape.

779
The displacement map set of all the spines is stored in a 780 tabular structure for simplicity. Card (1a) is in charge of data 781 loading. These data are transferred to dimensionality reduc-782 tion cards: (5a) performs a Principal Component Analysis 783 (PCA), (12a) uses Isomap, and (16a) uses a Uniform Man-784 ifold Approximation and Projection (UMAP). These algo-785 rithms have been implemented using the same card type, 786 configuring its options to select the technique and the number 787 of dimensions in the projected space (3 for PCA and 2 for the 788 rest). To visualize the results, we use three scatter plots for 789 PCA (2a, 6a, and 11a), one for Isomap (13a), and UMAP 790 (17a). The scatter plots propagate their selection to cards 791 designed to create groups of elements (8a, 14a, 18a). The 792 scatter plots linked to the PCA combine the three selections 793 using an OR operator (7a) before (8a). Then, we recover the 794 surface mesh of every spine from its planar representation 795 (3a) to allow the visual inspection of the clusterization in their 796 corresponding 3D-viewer (9a, 15a, 18a). Finally, to compare 797 the clusterization performed with the three dimensionality reduction methods, we use a 3D viewer (4a) and a bar chart (10a) that allows data superimposition. To avoid visual 800 cluttering, (4a) only shows the data selected on (2a), (13a), 801 and (17a). 802 We group the visualization cards into four panels to 803 reduce the number of views shown at the same time. Three 804 panels are used to cluster spines using different projec-805 tions. The last panel compares the clusterization results. 806 We illustrate this final panel on the bottom side of Fig. 2.

807
As a result of the above analysis, we conclude that UMAP 808 should be the preferred DR algorithm for an automatic spine to enable effective use of the meta-framework by users with 852 no programming skills. Since the flexibility of VMetaFlow is 853 one of the system's most relevant advantages, they propose 854 to support natively and efficiently more data types (not just 855 tabular data) to improve capability over existing frameworks. 856 Finally, all experts showed concerns about the data size that 857 can be handled by VMetaFlow. We will further discuss this 858 point in the following section.

860
Multiple views and interaction have proven to be two of 861 the most valuable approaches to handle complexity in the 862 data visualization field. We designed our system to boost the 863 prototyping of multiple-coordinated-view applications. Each 864 view can be created using the most adequate visualization 865 framework (or even an ad hoc data view implemented by 866 the user), and VMetaFlow simplifies the interaction among 867 them. Cards are the minimal functional unit of our sys-868 tem. Data views and their corresponding interactions are 869 embedded in individual view cards. Users must divide their 870 problem into visualization and data processing cards which 871 can be implemented using the optimal visualization or data 872 processing technologies to solve each problem. Interactions 873 between views can be described at a higher level in the 874 application DFD. This approach promotes modularity and 875 enables extensibility and reusability, which are essential for 876 fast prototyping. Additionally, extensibility and reusability 877 allow this meta-framework to tackle a wide variety of tasks 878 in different scientific and business domains. VMetaFlow's 879 DFDs can be used in three different ways to support the 880 types of applications described in Section I: (1) users can 881 interactively manipulate both the DFD and the application 882 panels (exploratory analysis), (2) users can only visual-883 ize the layout (fixed workflow applications), and (3) users 884 can manipulate the layout and examine the DFD, with-885 out changing it (exploratory analysis and fixed workflow 886 applications). 887 We separate the data views from the DFD tab to reduce user 888 cognitive overload. However, displaying the visualization on 889 the DFD simplifies interactive exploratory analysis. We are 890 currently working on offering both possibilities, following a 891 similar approach to VisFlow. This framework implements a 892 transition animation between the DFD tab and the display 893 panel that allows users to easily understand the correspon-894 dence between DFD nodes and views.

895
In VMetaFlow, interactions between views are explicitly 896 defined in the DFD. Several authors have pointed out that the 897 relationship between coordinated views is not always evident 898 [39], [40], [50]. Displaying interactions in the DFD helps with 899 the implementation, debugging, and use of coordinated view-900 based applications.

901
In this paper, we proposed a data flow model that 902 overcomes the limitation of the subset flow model (see 903 Section IV-C for further details    is adequate for small data sets or if the data are not modi-963 fied. To promote VMetaFlow over other systems, surveyed 964 external experts recommend extending our meta-framework 965 adding native support to other data types, such as graphs, 966 and finding an efficient way to handle non-native data 967 types.

968
Regarding data analysis capabilities, we have shown in 969 the case studies that VMetaFlow enables to carry out essen-970 tial operations in this domain, such as dimension reduction. 971 Furthermore, our meta-framework is not limited to those 972 techniques, and any algorithm (implemented in R, Python 973 or JavaScript) can be embedded in a processing card. Some 974 data processing cards run on the server-side, allowing com-975 putationally demanding tasks to be executed in powerful 976 dedicated servers. We plan to continue developing this idea, 977 preparing the system to run not only processing cards on the 978 server but visualization ones. We will optimize the system to 979 keep the data transfer between the server and the client to the 980 minimum.

981
Despite the popularity of web-based visualization frame-982 works, all surveyed experts agree on the constraints of these 983 systems when handling large data sets. We believe that a 984 future version of VMetaFlow can get around this limita-985 tion by adding a third card type. These cards will perform 986 filtering, aggregation, and data derivation on the server-987 side, only transferring it to the client-side when necessary. 988 This approach will alleviate the computational constraints of 989 web-based visualization frameworks.