Code
# Load packages
library(tourr)
library(cassowaryr)
library(spinebil)
library(ferrn)
library(dplyr)
library(tidyr)
library(purrr)
library(ggplot2)stringy05 is suitable as a projection pursuit index16 Jun 2026
Before using stringy05 inside a guided tour, I first check whether it behaves like a useful projection pursuit index.
For data matrix \(X \in \mathbb{R}^{n \times p}\) and projection basis \(A \in \mathbb{R}^{p \times 2}\), the projected data are
\[ Y = XA. \]
The projection pursuit index is then calculated on \(Y\).
Here I compare two versions of the index:
stringy05(rescale = FALSE): the original stringy05 index.stringy05(rescale = TRUE): a rescaled version designed to reduce the influence of values that commonly arise from Gaussian noise.In previous work, I estimated the distribution of the stringy05 index under bivariate Gaussian noise across different sample sizes. Using the estimated upper tail of this noise distribution, I derived a sample-size-dependent threshold
\[ \ell(n) = 0.05 + \frac{3.86}{\sqrt{n}}. \]
The rescaled index is then defined as
\[ I_{\text{rescaled}} = \max \left( 0, \frac{I - \ell(n)} {1 - \ell(n)} \right), \]
where \(I\) is the original stringy05 value.
This transformation preserves the upper end of the index while shrinking values that are likely to arise from Gaussian noise. Values below the estimated noise threshold are mapped to zero, and values above the threshold are linearly rescaled to the interval \([0,1]\).
The motivation is that random noise projections can sometimes produce non-zero stringy values, potentially making optimisation more difficult. By incorporating an estimated noise threshold, the rescaled version aims to increase separation between structured projections and noise projections, allowing the optimiser to focus more strongly on projections containing genuine stringy structure.
# Rescaling function for stringy05
rescale_stringy05 <- function(z, n) {
lb <- 0.05 + 3.86 / sqrt(n)
pmax(0, (z - lb) / (1 - lb))
}
stringy05_raw <- function(mat) {
cassowaryr::sc_stringy05(mat[, 1], mat[, 2])
}
stringy05_rescaled <- function(mat) {
z <- cassowaryr::sc_stringy05(mat[, 1], mat[, 2])
rescale_stringy05(z, nrow(mat))
}I create data where only two variables contain the hidden polynomial structure. The signal is placed in variables 2 and 3, while variables 1 and 4 are Gaussian noise. I also add a small amount of noise to the polynomial variables so the structure is not perfectly deterministic.
make_poly_data <- function(n = 300, p = 4, degree = 2, seed = 1050,
signal_noise_sd = 0.005) {
set.seed(seed)
t <- seq(-1, 1, length.out = n)
signal <- poly(t, degree = degree, raw = TRUE)
x <- matrix(rnorm(n * p), nrow = n, ncol = p)
x[, 2] <- signal[, 1] + rnorm(n, sd = signal_noise_sd)
x[, 3] <- signal[, 2] + rnorm(n, sd = signal_noise_sd)
colnames(x) <- paste0("V", seq_len(p))
as.data.frame(x)
}
dat4 <- make_poly_data(n = 300, p = 4, degree = 2, seed = 1050)The true interesting projection is the plane spanned by variables 2 and 3.
Before using the guided tour, I first plotted the two signal variables directly. This is the target structure that I want the guided tour to find.
A useful projection pursuit index should give a higher value to the structured projection than to a noise projection.
# Rescaling function for stringy05
rescale_stringy05 <- function(z, n) {
lb <- 0.05 + 3.86 / sqrt(n)
pmax(0, (z - lb) / (1 - lb))
}
stringy05_raw <- function(mat) {
cassowaryr::sc_stringy05(mat[, 1], mat[, 2])
}
stringy05_rescaled <- function(mat) {
z <- cassowaryr::sc_stringy05(mat[, 1], mat[, 2])
rescale_stringy05(z, nrow(mat))
}
direct_check <- tibble(
projection = c("true polynomial plane", "noise plane"),
raw = c(
stringy05_raw(as.matrix(dat4) %*% basis_true),
stringy05_raw(as.matrix(dat4) %*% basis_noise)
),
rescaled = c(
stringy05_rescaled(as.matrix(dat4) %*% basis_true),
stringy05_rescaled(as.matrix(dat4) %*% basis_noise)
)
)
knitr::kable(direct_check)| projection | raw | rescaled |
|---|---|---|
| true polynomial plane | 0.9880098 | 0.9835106 |
| noise plane | 0.2421718 | 0.0000000 |
If stringy05 is useful as a PPI, the true polynomial plane should have a larger value than the noise plane.
spinebilFor a 2D guided tour, the index should measure the plane, not the orientation inside the plane. This means rotating the projected 2D data should not substantially change the index value.
Here I use spinebil::profile_rotation() to rotate the true 2D polynomial projection and calculate both index versions at each rotation step.
d_true <- as.matrix(dat4) %*% basis_true
index_list <- list(
stringy05_raw,
stringy05_rescaled
)
index_labels <- c(
"stringy05 raw",
"stringy05 rescaled"
)
rotation_values <- spinebil::profile_rotation(
d = d_true,
index_list = index_list,
index_labels = index_labels,
n = 200
)
spinebil::plot_rotation(rotation_values)
The rotation profile is almost flat for both the raw and rescaled versions of stringy05. This suggests that the index is approximately rotation invariant: rotating the same 2D projection does not substantially change the index value. This is a good property for a projection pursuit index because the guided tour should evaluate the projection plane itself, not the arbitrary orientation of the points inside the 2D display.
Next, I examine how the index changes along an interpolated planned tour path from a noise projection to the true polynomial projection.
A useful index should increase as the projection approaches the hidden structure. If the index only increases very close to the true projection, it may be difficult for an optimizer to find the structure.

Both the raw and rescaled versions remain relatively flat until very close to the true projection, where the index increases sharply, suggesting that stringy05 has relatively low squintability and may be challenging for optimizers to locate from random starting projections.
Squintability checks how early the index starts improving when moving from random projections toward the best projection. Here the theoretical best projection is the plane spanned by variables 2 and 3.
stringy05_raw <- function() {
function(mat) {
cassowaryr::sc_stringy05(mat[, 1], mat[, 2])
}
}
stringy05_rescaled <- function() {
function(mat) {
z <- cassowaryr::sc_stringy05(mat[, 1], mat[, 2])
rescale_stringy05(z, nrow(mat))
}
}
set.seed(1050)
dat4_mat <- as.matrix(dat4)
basis_squint_raw <- ferrn::sample_bases(
idx = "stringy05_raw",
data = dat4_mat,
n_basis = 20,
best = basis_true,
min_proj_dist = 0.5,
step_size = 0.02,
parallel = FALSE,
seed = 1050
)
basis_squint_rescaled <- ferrn::sample_bases(
idx = "stringy05_rescaled",
data = dat4_mat,
n_basis = 20,
best = basis_true,
min_proj_dist = 0.5,
step_size = 0.02,
parallel = FALSE,
seed = 1050
)
squint_raw <- ferrn::calc_squintability(
basis_squint_raw,
method = "ks",
bin_width = 0.02
)
squint_rescaled <- ferrn::calc_squintability(
basis_squint_rescaled,
method = "ks",
bin_width = 0.02
)
squint_raw# PP index: stringy05_raw
# No. of bases: 20 -> 1454
# method: ks
max_x max_d squint
<dbl> <dbl> <dbl>
1 0.284 2.01 0.571
# PP index: stringy05_rescaled
# No. of bases: 20 -> 1454
# method: ks
max_x max_d squint
<dbl> <dbl> <dbl>
1 0.284 2.14 0.609
To be written …