There is an industry-wide need to improve internal processes continuously, and in some cases to re-engineer them completely. But across industry this purpose has not yet been met for information workers in any conspicuous manner [4].
Often an interactive information system will be the means to implement the desired process. In effect, the information system is a major part of the implementation plan for the new process. When software is developed to support user work processes (WPs) it's purpose is to improve their quality, to make them faster, cheaper, more accurate, reduce variance, etc. However, in current practice the design of the WP and the design of the supporting software actually proceed as independent activities. It should not be any surprise that the resulting pieces are often a mismatch, disappointing users, frustrating computing support people, and failing to produce a worthwhile WP improvement in return for the computing investment.
Early research on software engineering identified a number of breakdowns in joint problem solving between software technologists and domain experts as source of difficulties in creating usable ans successful applications [3]. Jacobson and his colleagues have proposed the concept and notation for Use-cases [9, 10] as a way of bridging the gap between user-centered requirements analysis and technology-centered design. Use-cases have become very popular, and are now being included in the proposed standanrd Unified Modeling Language (UML) [13].
Use-cases are a very valuable way to describe user work processes that have already been defined and then extend them into software design via other UML notation. This step is very important because making the connection from WP to useful software requirements is slow, unreliable and expensive.
However, It is currently beyond the scope for Use-cases to help with the important analysis of how and where information technology could improve on an existing process. Further Use-case notation does not support comparison of an "as-is" work process with a "to-be", or comparison of two or more alternative applications Therefore, we are still left with a disconnect between the design of a better work process and the design of the software, which is a key part of it's implementation plan. The method to achieve a good matching between better work process and the required software must be iterative and incremental.
The goal of our own research is to demonstrate how software methods for internal information systems can function as the implementation arm of WP quality improvements [1, 2]. In this lifecycle analysis begins by understanding the existing WP and continues through re-use of business-oriented components (BOCs) during continuous WP revisions for later improvements.
Figure 1shows how the parallel tracks for WP design and software design can interact to converge. Initially the As-is WP is modeled in terms of user tasks and data to understand what improvements are needed and how they may be accomplished. The As-is software is also modeled, by developing UML event sequence diagrams to document how it currently works.
Figure 1.
The WP analyst then determines how the WP should be improved by developing a Desired model. This WP model's database can be queried to obtain usage distributions of user data over all user tasks. The distributions provide the input to develop a cluster analysis on user data. As will be shown in section 4.2, the clusters provide candidate BOC definitions as requirements that are fed to the software designer. The software designer takes the BOCs as preliminary requirements in the form of classes to develop a Corresponding object-oriented (OO) model. The OO model is developed enough to analyze the technical feasibility of the BOCs. The analyst re-creates use-cases from the WP model as event sequence diagrams that include the necessary middleware, legacy functions or data.
The results of the feasibility analysis on the Corresponding OO model provide feedback to the WP designer how much and how long it will cost for the software to implement the Desired WP. The WP designer can then perform a cost-benefit analysis and a sensitivity analysis to determine priorities for Feasible changes. We expect this decision will actually require several iterations between WP and software, while several possible pairs of designs get developed and analyzed.
The results of the parallel, interactive design activities is a matched pair of designs: A WP design that is valuable and Implementable; A software design for BOCs that is feasible and cost-effective. When implemented the library of BOCs support rapid application development through resuse since the BOCs' definitions were literally derived from the user tasks that make up the WP.
Our own analysis shows that a major obstacle to better coordination between the people who are responsible for WPs and those who create software lies in the difficulty of translating between WP models and software models. Our focus here will be on the tools to make that translation more systematic and relilable, and far less labor intesive.We will now describe in more detail the software tools that are needed to support the analysis and design parts of the lifecyce. Here we will focus on the tools for modeling and analyzing WPs, for deriving BOC definitions from WP models, and for downloading BOC definitions to a Unified Modeling Language (UML , see ref. 13) modeler. Our general approach is to integrate user task analysis, business work process (WP) simulation, and object definition for an integrated method of software engineering.
Several alternate views of how the same work gets performed are valuable for different aspects of design. We assert that a business process can be viewed as some aggregation of user tasks. User-centered models focus on a higher level of description than technology-centered models, which describe how the technology will work or how it will be implemented. Historically it has been very difficult to integrate the two views.
In the specifications of the Workflow Management Coalition [8] the user-centered model corresponds to a process defintion, in which the WP is represented by a hierarchy of event sequences. In the specifications of the IDEF3 process modeling language [11] the lowest level is a user task, where users interact with data to accomplish work goals. In terms of UML the users' tasks are initially represented as use-cases [9, 10]. A use-case describes an instance of how users actually accomplish a work goal.
An important strength of representing the user-centered view with UML is that use-cases provide good continuity from user requirements through detailed design for coding. However, the UML 1.1 notation for use-cases does not provide any support for the (re)design of the user work processes, which we contend is often the only real purpose for a business to invest in user-interactive applications.
In order to analyze and design the WP we convert our use-cases into a process-oriented view. Figure 2 shows that an event Begin Week triggers the WP. The rest of the events in the flow diagram represent the completion of user functions. In Figure 3 we show how the user function for Updated Daily Work Hours is further decomposed. Figure 4 is an example of how two resources, User and ETS, interact to change one anothers' states to accomplish the Entered Hours task.
Figure 2.
Figure 3.
Figure 4.
Figure 5 illustates an additional modeling requirement to represent data usage within tasks. In order to perform work a WP needs resources of various types. Three of the more important resource types here are Users, Electronic Information Systems, and Manual Information Systems. Technically, information is not a resource because it is not finite, however, the means to access it is finite, so we treat information access as though it were a resource. Each of the three types may serve as a container of information that is needed to complete a task. Examples of non-electronic sources are other people, paper documents, the physical environment, or the user, him/herself, if the information is retrieved from personal memory. Information containers may also contain other information containers.
Figure 5.
Figure 5 shows how each data concept is represented by an attribute of its containing object. Every datum that is used in a given container state should be represented with a non-null value, whether or not its source is currently electronically stored and accessed. As shown in Figure 5, the association should explicitly capture the attribute values required by the user's task for each state of the container.
Since containment determines the source of an attribute, this feature is also valuable for identifying re-usable portions of legacy systems. By explicitly modeling data sources we can define re-use potential for legacy systems
A key modeling rule can be seen by comparing Figures 5 and 6. The data elements modeled for each task-state should reflect actual use. In Figure 5 the users must know the value for ETScommand when they are in the state Select DH. The same attribute has a different value in Figure 6 when the user is in state Ready to Enter. But users must also know values for attributes that were null in the previous state, such as regHours, jobNumber, etc.
Figure 6.
We use this technique to accumulate the state-dependent data requirements for each information container incrementally, as we model the use-cases in the WP modeler. This capability allows us to capture the association between each datum and any of the tasks where it is used. These associations provide the basis of the cluster analyis of data elements into preliminary definitions of components.
Figure 7 shows a conceptual matrix of data occurancees distrubted over tasks. The list of data names are on the top axis. The task names are listed down the left. The usage of a data concept in a given task is represented as a "1" in the row of "1"s and "0"s following that task name. A "0" indicates no usage in that task. The co-occurance (or non-occurance) of data elements with one another within tasks is of particular interest in the design of software to support a business process made up of those tasks. This matrix allows us to begin investigating their co-occurance.
Figure 7.
We obtained a database extract from the modeling tool in a similar format to that shown in Figure 7. The matrix is actually a set of distributions of data usage over tasks, which provide input to a cluster analysis on data usage. The cluster analysis procedures will be described next.
One of the defining features of a good, high-level object is that it should contain information attributes that need to be used in conjunction with one another. These clusters can be derived by analyzing the distribution of data elements over the tasks where they occur. We used a graph-theoretic approach called PathFinder [14]. This secton gives the steps for the PathFinder procedures we used for performing cluster analysis on the data usage distributions.
The results of our PathFinder cluster analysis are shown in several steps in Figures 8-11, along with the user-interface controls of our ClusterVis tool that performs the PathFinder analysis. ClusterVis was built by Dr. Chris Esposito using the Amulet GUI builder, and the Graphics Layout Toolkit from Tom Sawyer. ClusterVis screens shown here are running on IRIX 5.3.
Figure 8 shows the graph-theoretic network which represents the strength of association among all pairs of data concepts for employee time-keeping. A circular layout is shown. Two alternative layouts to the circular are also available, orthogonal and symmetric.
Figure 8.
The construction of PathFinder networks requires three pieces of data for input:
1. An N x N distance matrix, where N is the number of data elements. Coefficients that vary in value from 0 to 1.0 can be calculated to measure the strength of co-occurance of any pair of data elements with one another from a matrix such as such as shown earlier in Figure 7. There are several formulas for association coefficients. Following the guidlines in Everitt [6] for a sparce matrix input to cluster analysis, we chose the Jaccard coefficient. The matrix of coefficients was calculated for exhaustive pairings of data elements to represent the strength of user need to use them together in common tasks.
2. A value for Q, from 1 to N-1. Increases in the Q value result in fewer edges in the final network. Usually, the selection of a good Q-value is not deterministic and requires some investigation.
3. A value for R, from 1 to infinity, with infinity requiring only ordinal assumptions about the N x N distance matrix.
The standard Pathfinder algorithm examines every pair of nodes i, k (data elements) and keeps their direct connecting edge Eik in the final network if Dik <= DSPik, where:
Dik is the distance associated with Eik
Pik is an Allowable Alternative Path from i to k if it has <= Q edges
SPik is the shortest of all Pik
DSPik is the length of SPik
The length of SPik will be calculated using the Minkowski [7] metric:
DSPik = (DiaR + Dab R + Dbc R + ... + DjkR ) (1/R)
The sparsest network results from Q = N-1 and R = infinity. This network is the union of all minimal cost spanning trees for the complete graph represented by the input matrix. Reducing either Q or R adds additional edges into the network.
Figure 8 shows the results of a graph-theoretic representation which displays a network structure or arcs for the strength of associations among all pairs of data elements. The spatial layout in Figure 8 is only an aid to understanding. Unlike multidimensional scaling, the spatial relations are not usually significant in themself. The network structure of the arcs is of greatest interest. Some research has shown that interesting clusters appear as cliques, near-cliques and stars [5]. These patterns are sub-graphs that can be identified using the ClusterVis cluster finding algorithms, which are available in the Analysis pulldown menu. Algorithms have been implemented for stars, cliques, and perfect correlations.
A perfect correlation among large portions of data can be a characteristic of mainframe applications. The term "monolithic" has also been used for this characteristic. It can be seen in the current example by using the Edge Visibility control on the far right side, and setting it to display only those arcs with a distance = 0.0 in Figure 9. The contrasting network in Figure 10, without any arcs = 0.0 shows how much of the network they occupy.
Figure 9.
Figure 10.
Once clusters have been identified ClusterVis has a node editor them to be placed into a higher-level node. For example, Figure 11 shows how the node editor can collapse the cluster of perfectly correlated items into a higher level nodes. Our analysis tells us that the existing support application is very monolithic, and that we should investigate a software design that will be more flexible. The nodes in Figure 11 respresent clusters of user data which were distributed over As-is user tasks in correlated manner. Since they tend to be utilized at the same times and in the same tasks, we will use them as a starting point to analyze how a web front-end could be added to increase flexibility.
Figure 11.
The higher-level nodes can be viewed as clusters of attributes and, as such, constitute candidate BOCs. The next step was to treat the clusters as object classes to begin building an OO model for analysis and software design. Figure 12 shows the Authorization class, whose definition was downloaded directly from ClusterVis onto IBM's OBJChart.
Figure 12.
We chose IBM's OBJChart because its development is centered on satisfying the OMG's Unified Modeling Language (UML), it interfaces with a strong visual programming environment, and the file specification was provided to us, along with excellent support.
The ClusterVis analysis program is display oriented; it formats its data for screen display. So we built our own program for interfacing with OBJChar. We first modified ClusterVis to capture the cluster definitions for processing. The output for each cluster is the name of the cluster and the names and types of the attributes it contains.
The OBJChart input requirement is for a file in the .mdl format. OBJChart is object-oriented, so several types of data are used by it. However, the scope here is limited to classes containing objects which are data items. A cluster becomes a class and a data attribute becomes an object.
The translation program is called UCDE Filter. It is a filter to translate cluster terminology into the object terminology expected by OBJChart while keeping the structure essentially the same. The translation program is a straight-forward filter, translating syntax terms into the target environment, while keeping the same basic structure.
In Figure 12 the software designer has also added a number of other new classes for software ArchitectureComponents and for the different types of Users of the system.
In Figure 13 the designer is using objects from each of the classes to develop an event trace diagram that will satify the meta use-case for EnterDailyTime. As can be seen a key aspect of the BOC design is to exploit the existing mainframe databases, ims_DB and db2_DB. In this manner the software designer can determine the feasibility for building a web front end that will implement the BOCs and support the required use-cases.
Figure 13.
The resulting UML model serves as a feasibility analysis that can be feed back to the WP designer. Our intent is to support a technical engineering dialog between WP design and OO design, based on quantifiable value to the WP and technical feasibility for the software.
It could be very valuable during the dialog with the domain expert if the OO designer could talk about a credible estimate of development costs and time schedules for the software. To support cost and schedule estimation we are currently investigating an interface to download the UML model to a project estimation tool, such as Price-S. Recent development of the POPS method for estimating software complexity appear to have good predictive validity [12] and there is preliminary work on an interface from UML to Price-S.
To support WP value estimation we have begun developing modeling and simulation techniques for the IDEF3 Process Description language [11]. IDEF3 is complete and consistent enough for powerful discrete event simulation engines. The results of simulation can not only provide insight about how to improve a WP, but also generate estimates for performance variables such as mean cycle-time, variance in cycle-time, and error rates, which are often sensitive to information distribution and quality. Improvement in these variables can be quantified in terms of costs.
We have also begun investigating how our tools can be applied to estimate re-use. One way to view re-use is in terms of how much a WP can change and still be supported by the same set of BOCs. We have begun developing metrics and algorithms to compare the amount of change between several PathFinder networks. We will take a WP model through several versions that anticipate changes in business requirements, such as increased production rate, cost reduction, etc. Then derive the corresponding PathFinder networks. If the metric shows that the network is insensitive to changes in the WP, it may mean that set of BOCs have good re-use expectancy.