I'm often testing forecasting models, and what this ends up creating is a bunch of "forecasted" variables that are paired with the "actual" values. R has fabulous faceting capabilities, and I have often wanted to reshape the data in a way where the category of forecasted variable as an identifier, and then two columns that list the forecasted and actual variables. In other words, if the code starts from something like
aAct aPred bAct bPred id 1 1.2076384 -0.6735547 1.4994464 -1.0691975 1 2 0.4999706 -0.7188215 -0.3601551 0.7224729 2 3 1.0340859 -0.1108304 -0.5941295 0.5027085 3And I want to convert it where one column has an id, another one identifies whether I'm forecasting a or b, and a third column that has the forecasted value, and then a fourth column with the actual value.
The procedure in R involves "melting" the data frame and then "casting" it. Melting is rather simple -- you provide a set of identifiers, and then the data frame is melted down to only that identifier, the values, and another indicator variable that tells you what the value is supposed to represent. In the above example, if we let df be the data frame described above, I would run:
cast(df.m, <IDENTIFIERS> ~ <VALUES>)
The second part is known as the casting formula and is the part that I have struggled with. But in its most simplest form, the casted frame will look like something with all the identifiers added together as uniquely identifying units ,and then the <VALUES> variables being the labels for the actual value column. If that sounded confusing, I apologize. Perhaps solving the example would help.
First, I need to find a way to identify whether a row is looking at a or b, and whether it is a forecast or an actual variable. So I first create these variables:
df.m$var = substring(df.m$variable, 1, 1)
df.m$type = substring(df.m$variable, 2)
Which gives me the data frame:
Now I can cast the frame. In this case, I would use the formula
df.mc = cast(df.m, id + var ~ type)
This is how you interpret the formula. Id + var means that every observation is uniquely identified by it's id code and the variable we're forecasting -- a or b. Then "type" on the right side represents the new variable names that will be filled by the values.
Hope this is useful to others so they don't end up spending hours agonizing over the issue as did I.