The data consists of measurements of spinal bone mineral density.
Several such measurements were taken from 261 North American adolescents
over a few years. The entire dataset is called bone
and can
be found in the R
package loon.data
.
As can be seen, there are five variates. The idnum
uniquely identifies each of the 261 adolescents (N.B. these are not
numbered 1 to 261), sex
identifies their sex, and
ethnic
their “ethnicity/race”. The response variate of
interest is rspnbmd
which is a relative measure of spinal
bone mineral density determined as the ratio of the difference in bone
mineral density as measured on two consecutive visits divided by their
average. Similarly, the explanatory variate age
is the
average of the adolescent’s age in years on those two visits.
In this vignette we investigate the fit of smoothing splines to this data.
To begin, execute the following code:
# The plot
x <- bone$age
y <- bone$rspnbmd
# A scatterplot
p <- l_plot(x, y,
color="darkgrey",
xlabel="age", ylabel="rspnbmd",
showGuides = TRUE, showScales = TRUE,
itemLabel = paste0("IDnum = ", bone$idnum, "\n",
bone$sex, "\n",
"Age: ", bone$age),
showItemLabels = TRUE,
linkingGroup="Bone density",
title = "Spinal bone mineral density (rspnbmd)")
Two windows will have appeared once the above code has been executed.
One is the plot p
, the other the inspector
of the plot p
.
The plot is interactive. Hovering the mouse over a point in the plot,
for example, will pop up the itemLabel
for that point.
Scrolling on mouse wheel (or equivalent) over the plot
zooms in (or out) on the plot; note that the zoom is
centred at the mouse position. For horizontal zooming
only, hold the “control” key down while scrolling; for vertical
zooming only hold the “alt” (or cmd
on a Mac) key
while scrolling.
Zoom in anywhere on the plot. Notice that the inspector displays a miniature version of the whole plot as its World View at the top of its display. The plot is bounded in the World View by a grey rectangle and the region of the plot that is displayed is shown as a brighter region bounded by a black rectangle. This bright display region can be grabbed by clicking (and holding depressed) the left (or primary) mouse button. Moving the mouse around while keeping the left button depressed moves the bright region in the inspector which in turn cause the display in the plot to update accordingly. In this way, the inspector World View can be used to pan the entire display. Alternatively, panning may also be effected by right (or secondary) button clicking on the interior of the plot, holding that mouse button down and moving the mouse. In the inspector the region moves with the mouse, in the plot the background does.
Panning and zooming can occur either on the inspector plot or on the
plot itself. In either case, the panning or zooming is constrained to
the horizontal when the control key is held at the same time and to the
vertical when the alt
or cmd
key is held.
To get the displayed scatterplot back to its original scale, in the
inspector click on scale to: plot
. Alternatively the same
effect can be hand programmatically as l_scaleto_plot(p)
.
(To scale to all layers in the plot use
l_scaleto_world(p)
.)
See help(l_plot)
for more details and examples.
In the inspector window, immediately below the World View there are
several tabs, the first of which is the
Analysis tab. The first subsection of the analysis tab
contains plot attributes. The values here were determined when the plot
p
was created according to the values of the arguments
given to l_plot(...)
. These can be changed in the inspector
by toggling the check boxes.
The complete set of arguments that could have been used at the time
of creation can be had by querying the plot p
as
names(l_info_states)
. The values can also be accessed and
changed programmatically for example as in
The vertical variate, rspnbmd
, measured the
change in spinal bone mineral density. Anything above zero
indicates an increase, anything below a decrease; the magnitude is the
rate of change in density. It might be of interest, then, to add a
horizontal line to the plot at zero. This is accomplished by
layering a line on the plot p
as
follows:
axis <- l_layer_line(p,
x=extendrange(x, f=0.5), y=c(0,0),
label="axis", linewidth=2,
color = "black",
dash=c(10,10),
index="end") # last argument places axis behind other layers
This “axis” can be turned on and off via the inspector or
programmatically. On the inspector, click on the Layers
tab. The layers appear in a list ordered from top to bottom in the
inspector in the order in which they are displayed in the plot. The
axis
appears below the Scatterplot and hence is displayed
behind the points.
Selecting the axis, the up and down buttons at the bottom of the list allow the axis to be placed above or below the Scatterplot. The axis (or Scatterplot) can be moved up or down in the display. Either can be made (temporarily) invisible by clicking on the icon showing a cartoon eye with a stroke through it and made visible again by clicking on the cartoon eye. Try it.
The axis (or any other layer, e.g. the Scatterplot), could be removed entirely with the minus sign. Don’t do this right now; click on the “Analysis” tab instead.
A histogram can be constructed in the same way as for
numeric values. Now, because sex
is a
factor
, the result is essentially a simple bar plot with a
layer labelling the bars by the corresponding sex
:
Note that the inspector now shows the histogram (i.e. barplot) in the World View and that the plot section of the Analysis now has options peculiar to a histogram.
The histogram and the scatterplot both have the same
linkingGroup
“Bone density” and the inspector shows that
the 2 plots are linked. Selecting the left most bar of
the histogram highlights all of the females in both plots. Switching
back and forth between the two bars while observing the scatterplot
shows that the pattern for males seems to be shifted slightly to the
right of those for the females.
Similarly, selecting any point in the scatterplot causes a corresponding slice of the bar in which it appears in the histogram to be highlighted.
Multiple selections can made by holding the shift key while selecting. Alternatively, clicking on the background, holding the mouse button down while sweeping out a rectangle will highlight all data objects which intersect with the rectangle.
Once selected, the display of the points can be
modified by clicking on any of the colour patches to
change their colour, or the glyph symbols to change the shape of points,
the -
and +
signs to change size, and the
deactivate
(and reactivate
) to remove the
points from (or return them to) the display.
Try selecting the points in the scatterplot and making various modifications. Note that because the displays are linked the changes are effected (where sensible) in both displays. Note that once the points have been coloured it is also possible to select points by colour from the inspector.
Note that with shift sweeping (sweeping while the
shift key pressed), multiple selections can be made. Couple this with
dynamic
selection that deselect
s or
invert
s and very complex patterns of selected points can be
constructed.
To return the displays to their original configuration, from the
inspector reactivate all of the points, then select all
from the Select
part of the Analysis
panel,
then select the filled circle glyph shape, and a single colour from
those available (selecting the colour +
will pop up a
colour picker) To return to the original colour execute
p["color"] <- "darkgrey"
.
A second scatterplot could display other variates. For example, plotting the age versus the patient ID number gives:
p2 <- l_plot(bone$idnum, bone$age,
xlabel="idnum", ylabel="age",
linkingGroup="Bone density",
title = "ID numbers and age")
Ideally this plot would look like fairly uniform scatter. Assuming
that idnum
was assigned with recruitment there are some
patterns. In the middle of the idnum
range, for example,
there appears to be a preponderance of older ages followed immediately
by a preponderance of younger ages.
We might investigate how the change in idnum
from low to
high manifests itself in the relationship between bone mineral density
and age by brushing the points in p2
and
observing the effect in p
. To do this, click on
p2
so that the inspector has p2
as its focus
(appears in the inspector World View). In the inspector
Select
panel (on the Analysis
tab) select
by: brushing
. A rectangle will appear in p2
;
this is the brush.
Since we are interested in observing the relationship between bone
density and age conditional on idnum
, we
need to shape the brush to be a long, relatively thin, vertical brush.
The brush is reshaped by selecting the box in its lower-right corner and
moving it until you get the shape desired. The brush will maintain that
shape (unique to p2
) until it is again changed.
Now clicking anywhere in p2
will have the brush follow
the mouse (while the mouse button is depressed) and highlight all points
located within the rectangle. For example beginning at the lower left
corner of the scatterplot and moving the mouse left to right
horizontally, a tall narrow brush should select points with the same (or
nearly the same) idnum
and the relationship between bone
density and age as the idnum
increases can be seen in the
original scatterplot p
.
To have a sticky brush, or have the brush accumulate
the selections brushed, simply use the shift key as before. Again the
nature of the brushing can be changed by selecting different
dynamic
modes. To turn off brushing select
sweeping
in the Select
panel of the inspector
for p2
.
Zooming and panning in
p2
also reveals some interesting structure. Horizontally
zoom on p2
until each idnum
is well separated.
Then horizontally pan across the idnum
s (most easily done
from the inspector World View).
It becomes easy to see that each subject (idnum
) appears
one, two, or three times. Because each value is the change in
bone density (and so must be calculated on the basis of two visits) this
means that each person had two, three, or four visits. Checking the
scales
and guides
boxes of the plot panel in
the inspector and panning reveals also that for every
idnum
, the difference in age
are is at most 2
and does not appear to span more than 3 years. Together, this suggests
that the data may have been collected in a single time period of about 3
years. Moreover, the bulk of those idnums
which have three
entries occur early in the order of idnum
, possibly meaning
early in the recruitment.
To summarize the relationship, first add a straight line fitted by
lm()
using the function l_layer_smooth()
and
the method "lm"
.
l_layer_smooth(p, method = "lm",
label = "straight line fit",
linecolor = "firebrick",
linedash = c(4,4),
linewidth = 4)
The same function could also be used to add a smooth (default
method = "loess"
). Instead, we will add a smoothing spline
from the splines
package and use it to fit bone mineral
density as a function of time (i.e. to the data of p
).
library(splines)
# Fit a smoothing spline
fitsmooth <- smooth.spline(x, y, df=5)
xOrder <- order(x)
smooth <- l_layer_line(p,
x = x[xOrder],
y = predict(fitsmooth, x = x[xOrder])$y,
label = "smooth fit",
linewidth = 4,
color = "blue")
Unlike the straight line fit, the smooth shows that the change in spinal bone mineral density rises up to about 12 years of age and then declines thereafter ultimately hitting zero.
Of course this is the aggregate behaviour, over both sexes. It might be interesting to see how this changes for males and for females. We could do this by adding a smooth for each sex but there may be other subgroups of the data that we would like to investigate. To that end we introduce a dynamic update to the smooth.
## Define the update function
updateSmooth <- function(myPlot, minpts, df, color="blue") {
## Get the values for x and y from the plot
##
## For x
xnew <- myPlot["xTemp"]
if (length(xnew) == 0) {xnew <- myPlot["x"]}
## For y
ynew <- myPlot["yTemp"]
if (length(ynew) == 0) {ynew <- myPlot["y"]}
## Now **only** use the active selected points to construct the smooth
sel <- myPlot["selected"] & myPlot["active"]
xnew <- xnew[sel]
ynew <- ynew[sel]
Nsel <- sum(sel)
if (Nsel > 3 & diff(range(xnew)) > 0) {
## Find the range of the selected x values
xrng <- extendrange(xnew)
xvals.temp <- seq(from=min(xrng),
to=max(xrng),
length.out=100)
## Redo our smooth **only** if we have enough points
if ((Nsel > minpts) & (minpts > (df + 1))){
fit.temp <- smooth.spline(xnew, ynew, df=df)
ypred.temp <- predict(fit.temp,x=xvals.temp)$y
## update the smooth
if (smooth %in% l_layer_ids(myPlot)) {
## reconfigure the smooth with new data
l_configure(smooth, x=xvals.temp, y=ypred.temp)
} else {
## If the smooth has been deleted, then we recreate it
## (N.B. in the global environment)
smooth <<- l_layer_line(myPlot,
x=xvals.temp,
y=ypred.temp,
label="smooth fit",
linewidth=4,
color = color)
}
}
}
## Update the tcl language's event handler
tcl("update", "idletasks")
}
Now, we would like to have this update called whenever any
interesting change in state occurs. There are numerous such possible
states (see names(p)
, names(l_info_states(p))
,
or l_help("learn_R_bind")
). Here we bind an anonymous
function of no arguments to be called whenever there is any change in
the values of p
contained in its selected
state. This means that the function is called if any point in
p
is selected or deselected.
Note also that the smooth is based on the temporary
x
and y
values. Points in a
scatterplot may be moved by selecting them with the
control button depressed (as well as shift for multiple selection).
Alternatively, selected points may be pushed together,
distributed vertically or horizontally, arranged on a grid, or
jittered by selecting the corresponding move
button from the Modify
panel of the inspector. All points
can be returned to their original position by clicking the recover
button
# Here we "bind" the anonymous to the named state changes of p
l_bind_state(p, c("selected"),
function() {updateSmooth(p, 10, 5, "blue")}
)
Now go to the histogram and select first the female bar, then the make bar, and watch how the smoothing spline adapts to the sex. Clearly, females have greater changes in spinal bone mineral density at a younger age than do males. No doubt this is a consequence of the different ages at which girls and boys sexually mature.
Brushing any set of points, from any of the linked plots will now
cause the smooth function to automatically recalculate and redisplay. In
this way one might pursue, for example how the smooth changes over
subsets of idnum
values.
Note that the points must be both active and selected. We could, for example, focus on how the smooth changes only for any subset of females by first deactivating all males and then brushing the subset of the females.
Note that the states that are bound can be seen as
l_bind_state_ids(p)
and deleted using
l_bind_state_delete(p, "stateBinding0")
.
Linear smoothers can be thought of as the connected predicted values of locally fitted linear models having weights which are maximal at the point \(x\) where the prediction is being made and which drop off to zero for \(x\) values far away from it.
To illustrate this we add a straight line to the scatterplot that is fitted to the data via weighted least squares using Gaussian weights. For any collection of \(x\) values, the prediction will be made at their median.
The Gaussian weight function will be centred at the median:
GaussWt <- function(x) {
# Get an estimated standard deviation
h <- diff(range(x))/4
# Centre at median
xloc <- median(x)
# Gaussian density
dnorm(x, mean=xloc, sd=h)
}
Use these weights to fit a line to the data.
# Fit a local line using some Gaussian weights.
# Prediction will be at the median of x, fit by
### weights that decrease with x's
# distance from the median.
fitwls <- lm(y ~ x, weights=GaussWt(x))
linewls <- l_layer_line(p,
x=x,
y=predict(fitwls,
newdata=data.frame(x=x)),
label="Fitted line",
linewidth=4,
color = "blue")
Clicking on the Layers
tab in the inspector shows the
scatterplot, the axis, the smooth fit, and the Gaussian weight straight
line. Select the last of these and render it invisible by clicking on
that button, or, by programmatically by executing the following.
Now make the fitted line update to fit only the selected points.
updateLocalLine <- function(myPlot, minpts, df, volor="blue") {
## Get the values for x and y from the plot
## For x
xnew <- myPlot["xTemp"]
if (length(xnew) == 0) {xnew <- myPlot["x"]}
## For y
ynew <- myPlot["yTemp"]
if (length(ynew) == 0) {ynew <- myPlot["y"]}
## Now **only** use the active selected points to construct the smooth
sel <- myPlot["selected"] & myPlot["active"]
xnew <- xnew[sel]
ynew <- ynew[sel]
Nsel <- sum(sel)
if (Nsel > 3 & diff(range(xnew)) > 0) {
xrng <- extendrange(xnew)
xvals.temp <- seq(from=min(xrng),
to=max(xrng),
length.out=100)
## Redo line if more than two points.
if (Nsel> 2) {
fit.wls <- lm(ynew ~ xnew, weights=GaussWt(xnew))
ywls.temp <- predict(fit.wls,
newdata=data.frame(xnew=xvals.temp))
## update the fit
if (linewls %in% l_layer_ids(myPlot)) {
l_configure(linewls, x=xvals.temp, y=ywls.temp)
} else {
## If it's been deleted, we recreate it (in the global environment).
linewls <<- l_layer_line(myPlot,
x=xvals.temp,
y=predict(fitwls,
newdata=data.frame(x=xvals.temp)
),
label="GaussWt at median line",
linewidth=4,
color="blue"
)
}
}
}
## Update the tcl language's event handler
tcl("update", "idletasks")
}
And now bind this update function to change in p
of
either the select
or the active
states.
# Here we "bind" the anonymous to the named state changes of p
l_bind_state(p, c("active","selected"),
function() {updateLocalLine(p, 10, 5, "blue")}
)
Selecting the male or female sex in the histograms will show the
weighted least squares line for that sex. Brushing a tall thin vertical
brush on p2
will show how the fitted line changes (or not)
as the similar idnum
s change. But most interestingly, and
the object of this lesson, is to brush p2
with a short very
wide brush so that the age
can be kept roughly constant. As
age
increases or decreases, the line segment changes its
fit: both in height and in slope. The smooth seen earlier is essentially
the connected midpoints of these line segments.