Specialized Array Manipulation

The manner in which arrays are defined and used is designed to make the process as simple and intuitive as possible. In many cases, you can simply define the arrays you want to use, apply them to the variables you want to use them on, and continue working with little or no change to the model equations that now represent not one, but several values.

Sometimes, however, there may be things you want to do with arrays that are a bit more specialized. One common reason for this is working with arrays that have repeated dimensions, as they may, when looking at transition flows from and to different states, and also in doing mathematical manipulation of arrays.

Stella and XMILE provide a number of different ways to work with arrays that can be helpful in these situations.

Note You can also Subranges to simplify array manipulation without adding clutter in equations.

Dimension Elements Names Outside []

When used inside [], you can simply enter a dimension element name. For example, if we have the dimension size with entries small, medium, large, and the dimension doneness with entries rare, medium, well, then you could use prep_time[medium, medium] to denote how much time it takes to prepare a medium sized meal that will be served medium (between rare and well done) without ambiguity, as long as prep_time is arrayed by size and doneness.

If you want to use an element name outside of the [], then you need to specify which dimension the element came from. This is done using a . as in size.medium (the same notation used to qualify names by module).

Using this notation is convenient, because it allows you to make comparisons such as:

IF size < size.large THEN 1 ELSE 0

which could be used to determine if a small warming oven would be sufficient.

Dimension elements specified in this manner are treated as expressions rather than labels. The first label evaluates to 1, the second to 2, and so on.

Note: Because dimension_name.element is treated as an expression, no validation of array usage occurs.

Repeated Dimensions

Usually, an array is defined by a sequence of unique dimensions (Population[Country,Species,Sex]). If, in the equation for Population, any of Country, Species, or Sex were to be used, their meaning is unambiguous. Suppose, however, that we are looking at the transition from using one product to using another. Then, we might have transitioning[Product,Product] which could have the equation

leaving[Product]*transition_probability[Product,Product]

In this case, there are two occurrences of Product in the variable being defined, the first for leaving with only 1 element, and the second in transition_probablity with 2 elements. Stella will match the Product in leaving to the first Product in transitioning, the first product in transition_probability to the first in transitioning, and the second Product in transition_probability to the second in transitioning. As long as both transition_probablity and transitioning interpret the first occurrence of Product as the "from Product" (and the second as the "to Product"), that will give us the right result. If that isn't the case, we need to make use of the @ notation described below. In every case the @ notation is likely to be clearer.

The general rule that Stella uses when matching dimensions which repeat names is to use their position. Thus, in any variable used in an equation, the first occurrence of a dimension matches the first occurrence in the variable being defined, the second the second, and so on. This generally leads to the expected results, though use of the @ notation may make the equations clearer.

In some circumstances, Stella will generate an error rather than use this positional mapping.

  1. If any of the variables used in the equation are used in array builtins and have a * (or other range indicator) at any position, it will be treated as an ambiguity requiring @ notation to resolve.
  2. If the dimension appears more times in a variable than in the variable being defined, it will be treated as an error requiring @ notation to resolve.

For the first case, if you're working with a 2 dimensional array, you may be able to use the inner product operator. For example, c[dim1,dim1] = a*.*b (where both a and b are also arrayed by dim1,dim1) will do the equivalent of SUM(a[@1,*]*b[*,@2]) and be significantly easier to read.

@ Notation for Arrays

Rather than using an array range, you can use @N where N corresponds to the Nth ordinal position in the dimensions for the left hand side variables. This allows you to disambiguate any repeated dimensions. For example, if B[x,x] is being defined with A[x,x] then the equation

A[@2,@1]

would make B the transpose of A (see below for more discussion of transpose).

You can actually use @ in any equation, but the only validity checking it does is to confirm that the number you specify is less than or equal to the number of dimensions of the left hand side variable. For example, if B[x,y] is the variable being defined, and A[y,x] is used the equation

A[@2,@1]

would make B the transpose of A, but if you were to redefine A to be A[x,y], there would be no error detected, but the results wouldn't make sense. Had you used A[y,x] or A', on the other hand, the error would be recognized and reported.

So @ can be very useful when you need to disambiguate repeated dimensions, but should be used with care.

Transposing a Matrix

If a matrix has two dimensions A[x,y], you can effectively transpose it in an apply-to-all equation for B[y,x] by simply using A[x,y] in the equation. For example

A[x,y]

would make B the transpose of A (B is arrayed by y, x, so we've swapped the order of the dimensions). Using the transpose operator ' would do the same thing

A'

Though shorter, this notation is less clear. However, when both A and B are arrayed by the same dimension twice (A[x,x] B[x,x]), we can still use A', whereas A[x,x] would just return A (we could also use A[@2,@1], but A' is easier to read).

Note: For historical reasons, ' is also allowed in variable names without quotation marks. So you can name a variable A', which is simply a name, and not the transpose of A. The software recognizes the transpose operator only after first checking whether A' is a variable name.

Array Ranges

Many array builtins operate by reducing the number of dimensions. For example, SUM(D[x,*]) would sum across the second dimension of A. The * is the standard array range, and it represents all the elements of that dimension. If you change the size of the dimension, it will represent all the elements of the new dimension.

dimension_name.* Notation

Instead of simply * you can also use dimension_name.* to indicate all values of a dimension. For arrays that use the dimension the two notations are equivalent. In fact if array_ab has a and b as its dimensions the following are all identical

SUM(array_ab)

SUM(array_ab[*, *])

SUM_array_ab[a.*, b.*])

The last simply serves as a reminder of the dimensions or array_ab.

The dimension_name.* notation can also be used with subranges. As long as the named dimension is a subrange (or superrange) of the corresponding array dimension as described in Subranges.

When used with subranges the dimension name specified, not the dimension of the array, determines which dimension the operation occurs over. This is the distinction to *:subrange notation and N1:N2 notation discussed below.

Note The dimension.* notation is not supported in software releases prior to version 3.6.1 and will generate an error.

*:Subrange Notation

Instead of summing over the full dimension, you can subrange of the dimension. A subrange is any dimension that has all of its elements contained in the dimension the variable is arrayed by (see Subranges). Instead of a * use a *:subrange where subrange is the name of the smaller dimension. For example suppose you have a dimension Age with elements A1,A2,...A99,A100P and a dimension Youth with elements A10,A11,...A19,A20. In this case Youth is a subrange of Age and you could write

Total Young Population= SUM(Population[*:Youth])

where Population is arrayed by age.

Subranges do not need to be contiguous of in the same sequence as the containing dimension. For ImportantAges might be defined as A100P, A1, A16, A50 and it would still be a subrange of Age.

Using *:subrange notation will give the same answer as subrange.* would if it only appears a single time. However

SUM(arrayed_by_abc[ab.*] + arrayed_by_ab[ab.*])

will give the expected result while

SUM(arrayed_by_abc[*:ab] + arrayed_by_ab[*:ab])

will give an error as Stella will try to use both the dimension abc and the dimension ab to sum. This would result in all combinations of ab and abc on the subrange ab (aa, ab, ba, bb) instead of just aa and bb - 4 instead of 2 values in this case.

N1:N2Notation

You can also perform operations across contiguous elements in a dimension. Instead of a * you use N1:N2 where N1 and N2 are ordinal dimension element positions (or label names) or expressions. Thus, for example

SUM(A[1:3])

would give the sum over the first 3 elements of A. If A is arrayed by x and x is defined as x1,x2,x3,x4 you could also use

SUM(A[x1:x3])

with the same meaning.

If N1 and N2 are the same, this will select a single element, but remain valid in SUM and other array builtins.

Some caution should be exercised when using ranges in this manner. If you rearrange the elements of an array, the range use may end up returning unexpected results. The only validity checking done is that the last is at least as big as the first, and that it is not bigger than the biggest value in the dimension. This validation is done as part of equation checking when N1 and N2 are either element names or numbers.

Note: N1 and N2 can be element names, numbers or expressions involving other model variables.

If N1 and N2 are expressions involving other model variables, then range checking occurs only at run time and, for performance reasons, the results will be placed in the Message Log only if you've already opened it. Run time range checking doesn't stop the simulation. If N1 is less than 1, it will be treated as 1. If N2 is greater than the number of dimension entries, it will be treated as the number of dimension entries.

As long as N2 is greater than or equal to N1, N1 is greater than 0, and N1 is less than the number of dimension entries, the range will be computed on the valid subset created by adjusting N1 and N2 as necessary. If there is no valid subset (the preceding conditions are not true), then the SUM builtin will return 0, the PROD builtin 1, the MIN builtin inf (infinity), the MAX builtin -inf, the MEAN builtin 0, and STDDEV builtin 0.

Whether an adjustment is required to N1 or N2, or there is no valid range, an error message will be sent to the Simulation Log, if it's open. It's strongly recommended that you open the Message Log periodically on models using range expressions involving other variables.

When n1:n2 notation is repeated in an expression it will apply to the dimension associated with the variable it is applied to. If the arrays it is used on are dimensioned differently this will result in an error.

Array Expressions in Array Builtins

Using an array expression in an array builtin can simplify model equations, and bypass the need for extra variables whose only purpose is to be passed to an array builtin to reduce its dimensionality. The most common example is an inner product (and there is an operator for that, as discussed below). For example, to get average production quality across factories, you could write

weighted_quality[Factory] = quality[Factory]*production[Factory]

average_quality = SUM(weighted_quality)/SUM(production)

But the variable weighted_quality doesn't have any clear meaning, and can't be used in other model equations. It's simpler to write:

average_quality = SUM(quality*production)/SUM(production)

This is the same as:

average_quality = SUM(quality[*]*production[*])/SUM(production[*])

When an array expression is evaluated, matching dimensions are used together. Each of the *s in the above equation refers to Factory, and so they match. If you wanted to take the average only over the first two factories, you could use:

average_quality_first_two = SUM(quality[1:2]*production[1:2])/SUM(production[1:2])

In this case, Factory is again matched, but the range 1:2 is also matched. It would be an error to write:

average_quality_first_two = SUM(quality[1:2]*production)/SUM(production)

because Factory is used both over its entire range (implicitly *) and over a subrange.

Note: Using a variable dependent range, such as [x:y], in an array expression is not currently supported.

Inner Products and Array Expressions

The inner product operator *.* and Array builtins are used to reduce the dimensionality of variables in an expression, most commonly by summing over them. The inner product operator works on arrays with 1 or two dimensions, and uses the array names without specifying dimensions. For example, if a is arrayed by d1,d2 and b is arrayed by d2,d3 then c[d1,d3] = a*.*b is the same as c[d1,d3] = SUM(a[d1,*]*b[*,d3]). Similarly, if d is arrayed by d2 then e[d1] = a*.*d is the same as e[d1] = SUM(a[d1,*]*d[*]).

Choosing between explicit SUM notation and the inner product notation is largely a stylistic choice. When array elements are repeated, or abstract concepts, the inner product will often be easier to read. When arrays are concrete concepts, such as locations or sizes, the SUM notation may be preferred. For arrays with more than 2 dimensions, you'll need to use the SUM notation, possibly with @ indicators to disambiguate.