Package 'lionfish'

Title: Interactive 'tourr' Using 'python'
Description: Extends the functionality of the 'tourr' package by an interactive graphical user interface. The interactivity allows users to effortlessly refine their 'tourr' results by manual intervention, which allows for integration of expert knowledge and aids the interpretation of results. For more information on 'tourr' see Wickham et. al (2011) <doi:10.18637/jss.v040.i02> or <https://github.com/ggobi/tourr>.
Authors: Matthias Medl [aut, cre]
Maintainer: Matthias Medl <[email protected]>
License: MIT + file LICENSE
Version: 1.0.27
Built: 2025-03-14 04:50:07 UTC
Source: https://github.com/mmedl94/lionfish

Help Index


Australian Vacation Activities Dataset

Description

Australian Vacation Activities Dataset

Usage

ausActiv

Format

A 1003 x 44 array of binary responses

Details

The Australian Vacation Activities dataset includes responses from 1,003 adult Australians who were surveyed on their vacation activities through a permission-based internet panel in 2007.The responses are coded in binary with 0 indicating that the tourist didn't partake in the activity and 1 indicating they did.

Source

https://statistik.boku.ac.at/nachlass_leisch/MSA/

Examples

data(ausActiv)
head(ausActiv)

Check Whether 'anaconda' Environment Exists

Description

Checks if 'anaconda' environment of the given name is installed and returns TRUE if so.

Usage

check_conda_env(env_name = "r-lionfish")

Arguments

env_name

a string that defines the name of the 'anaconda' environment reticulate uses.

Value

boolean

Examples

check_conda_env(env_name="r-lionfish")

Check Whether Virtual 'python' Environment Exists

Description

Checks whether virtual 'python' environment of a given name exists an returns TRUE if it does.

Usage

check_venv(env_name = "r-lionfish")

Arguments

env_name

a string that defines the name of the 'python' environment reticulate uses.

Value

boolean

Examples

check_venv(env_name="r-lionfish")

Chemical Manufacturing Process Dataset

Description

Chemical Manufacturing Process Dataset

Usage

ChemicalManufacturingProcess

Format

A 176 x 58 array of continuous variables

Details

This data set contains information about a chemical manufacturing process, in which the goal is to understand the relationship between the process and the resulting final product yield and can be found in the in the AppliedPredictiveModeling R package. The data has been copied from http://appliedpredictivemodeling.com/data in agreement with their license.

Source

http://appliedpredictivemodeling.com/data

Examples

data(ChemicalManufacturingProcess)
head(ChemicalManufacturingProcess)

Get Guided Tour-Holes-Better History

Description

Returns a guided tour with the holes index and the 'search_better' argument to the python' backend. This guided tour is generated with the 'tourr' functions 'save_history' and 'guided_tour'.

Usage

get_guided_holes_better_history(data, dimension)

Arguments

data

the dataset to calculate the projections with.

dimension

1 for a 1d tour or 2 for a 2d tour

Value

history object containing the projections of the requested tour

Examples

data("flea", package = "tourr")
flea <- flea[-7]
get_guided_holes_better_history(flea, 2)

Get Guided Tour-Holes History

Description

Returns a guided tour with the holes index to the python' backend. This guided tour is generated with the 'tourr' functions 'save_history' and 'guided_tour'.

Usage

get_guided_holes_history(data, dimension)

Arguments

data

the dataset to calculate the projections with.

dimension

1 for a 1d tour or 2 for a 2d tour

Value

history object containing the projections of the requested tour

Examples

data("flea", package = "tourr")
flea <- flea[-7]
get_guided_holes_history(flea, 2)

Get Guided Tour-LDA History

Description

Returns a guided tour with the LDA index to the python' backend. This guided tour is generated with the 'tourr' functions 'save_history' and 'guided_tour'.

Usage

get_guided_lda_history(data, clusters, dimension)

Arguments

data

the dataset to calculate the projections with

clusters

the clusters for the lda to be performed on

dimension

1 for a 1d tour or 2 for a 2d tour

Value

history object containing the projections of the requested tour

Examples

data("flea", package = "tourr")
clusters <- as.numeric(factor(flea[[7]]))
get_guided_lda_history(flea[-7], clusters, 2)

Get Local Tour History

Description

Returns a local tour based on currently displayed projection(s) to the 'python' backend. This local tour is generated with the 'tourr' functions 'save_history' and 'local_tour'.

Usage

get_local_history(data, starting_projection)

Arguments

data

the dataset to calculate the projections with. In practice only the two first rows of the dataset are provided as the actual data is not needed.

starting_projection

the initial projection one wants to initiate the local tour from

Value

history object containing the projections of the requested tour

Examples

library(tourr)
data("flea", package = "tourr")
flea <- flea[-7]
prj <- tourr::basis_random(ncol(flea), 2)
get_local_history(flea, prj)

Initialize Environment for 'python' Backend

Description

Initializes the 'python' backend required for the functionality of lionfish. At first it searches whether a 'python' environment with the provided name exists or not. If it does, it will be loaded and the 'python' function 'check_backend' is run to check if it works properly. If no 'python' environment with the provided name exists, it will be installed and then loaded. This can either be done with or without Anaconda as package manager. Anaconda can be more robust, but the GUI will appear dated. Thus, trying 'init_env' with virtual_env = "virtual_env" out first is recommended. For 'Windows' users, the path to the tk and tcl libraries will be set, otherwise tkinter cannot run.

Usage

init_env(env_name = "r-lionfish", virtual_env = "virtual_env", local = FALSE)

Arguments

env_name

a string that defines the name of the python environment reticulate uses. This can be useful if one wants to use a preinstalled python environment.

virtual_env

either "virtual_env" or "anaconda". "virtual_env" creates a virtual environment, which has the advantage that the GUI looks much nicer and no previous python installation is required,but the setup of the environment can be more error prone. "anaconda" installs the python environment via Anaconda, which can be more stable, but the GUI looks more dated.

local

logical

Value

initializes python environment

Examples

if (check_venv()){
init_env(env_name = "r-lionfish", virtual_env = "virtual_env")
} else if (check_conda_env()){
init_env(env_name = "r-lionfish", virtual_env = "anaconda")
}

R Wrapper for 'interactive_tour' Function Written in 'python'

Description

Launches the lionfish GUI and at minimum requires the data do be loaded and the plot_objects. The other parameters are optional. For technical reasons the parameters half_range, n_plot_cols, n_subsets, color_scale, label_size and display_size cannot be adjusted from within the GUI. The GUI has to be closed and relaunched (possibly with load_interactive_tour) if you want to change them. Please visit https://mmedl94.github.io/lionfish/index.html for a detailed description of the GUI and its features.

Usage

interactive_tour(
  data,
  plot_objects,
  feature_names = NULL,
  half_range = NULL,
  n_plot_cols = 2,
  preselection = FALSE,
  preselection_names = FALSE,
  n_subsets = 3,
  display_size = 5,
  hover_cutoff = 10,
  label_size = 15,
  color_scale = "default",
  color_scale_heatmap = "default",
  axes_blendout_threshhold = 1
)

Arguments

data

the dataset you want to investigate

plot_objects

a named list of objects you want to be displayed. Each entry requires a definition of the type of display and a specification of what should be plotted.

feature_names

names of the features of the dataset

half_range

factor that influences the scaling of the displayed tour plots. Small values lead to more spread out datapoints (that might not fit the plotting area), while large values lead to the data being more compact. If not provided a good estimate will be calculated and used.

n_plot_cols

specifies the number of columns of the grid of the final display.

preselection

a vector that specifies in which subset each datapoint should be put initially.

preselection_names

a vector that specifies the names of the preselection subsets

n_subsets

the total number of available subsets.

display_size

rough size of each subplot in inches

hover_cutoff

number of features at which the switch from intransparent to transparent labels that can be hovered over to make them intransparent occurs

label_size

size of the labels of the feature names of 1d and 2d tours

color_scale

a viridis/matplotlib colormap to define the color scheme of the subgroups

color_scale_heatmap

a viridis/matplotlib colormap to define the color scheme of the heatmap

axes_blendout_threshhold

initial value of the threshold for blending out projection axes with a smaller length

Value

opens the interactive GUI

Examples

library(tourr)
data("flea", package = "tourr")
data <- flea[1:6]
clusters <- as.numeric(flea$species)
flea_subspecies <- unique(flea$species)
feature_names <- colnames(data)

guided_tour_history <- tourr::save_history(data,
  tour_path = tourr::guided_tour(holes())
)
grand_tour_history_1d <- tourr::save_history(data,
  tour_path = tourr::grand_tour(d = 1)
)

half_range <- max(sqrt(rowSums(data^2)))

obj1 <- list(type = "2d_tour", obj = guided_tour_history)
obj2 <- list(type = "1d_tour", obj = grand_tour_history_1d)
obj3 <- list(type = "scatter", obj = c("tars1", "tars2"))
obj4 <- list(type = "hist", obj = "head")

if (check_venv()){
init_env(env_name = "r-lionfish", virtual_env = "virtual_env")
} else if (check_conda_env()){
init_env(env_name = "r-lionfish", virtual_env = "anaconda")
}

if (interactive()){
interactive_tour(
  data = data,
  plot_objects = list(obj1, obj2, obj3, obj4),
  feature_names = feature_names,
  half_range = half_range,
  n_plot_cols = 2,
  preselection = clusters,
  preselection_names = flea_subspecies,
  n_subsets = 5,
  display_size = 5
)
}

R Wrapper for 'load_interactive_tour' Function Written in 'python'

Description

Loads a previously saved snapshot created by pressing the "Save projections and subsets" within the GUI. The data that was loaded when saving a snapshot has to be provided to this function when loading that snapshot. Additionally, this function allows to adjust some parameters of the GUI when loading a snapshot, such as display_size or label_size.

Usage

load_interactive_tour(
  data,
  directory_to_save,
  half_range = NULL,
  n_plot_cols = 2,
  preselection = FALSE,
  preselection_names = FALSE,
  n_subsets = FALSE,
  display_size = 5,
  hover_cutoff = 10,
  label_size = 15,
  color_scale = "default",
  color_scale_heatmap = "default",
  axes_blendout_threshhold = 1
)

Arguments

data

the dataset you want to investigate. Must be the same as the dataset that was loaded when the save was created!

directory_to_save

path to the location of the save folder

half_range

factor that influences the scaling of the displayed tour plots. Small values lead to more spread out datapoints (that might not fit the plotting area), while large values lead to the data being more compact. If not provided a good estimate will be calculated and used.

n_plot_cols

specifies the number of columns of the grid of the final display.

preselection

a vector that specifies in which subset each datapoint should be put initially.

preselection_names

a vector that specifies the names of the preselection subsets

n_subsets

the total number of available subsets (up to 10).

display_size

rough size of each subplot in inches

hover_cutoff

number of features at which the switch from intransparent to transparent labels that can be hovered over to make them intransparent occurs

label_size

size of the labels of the feature names of 1d and 2d tours

color_scale

a viridis/matplotlib colormap to define the color scheme of the subgroups

color_scale_heatmap

a viridis/matplotlib colormap to define the color scheme of the heatmap

axes_blendout_threshhold

initial value of the threshold for blending out projection axes with a smaller length

Value

opens the interactive GUI

Examples

data("flea", package = "tourr")
data <- flea[1:6]

if (check_venv()){
init_env(env_name = "r-lionfish", virtual_env = "virtual_env")
} else if (check_conda_env()){
init_env(env_name = "r-lionfish", virtual_env = "anaconda")
}

pytourr_dir <- find.package("lionfish", lib.loc = NULL, quiet = TRUE)
pytourr_dir <- paste(pytourr_dir, "/inst/test_snapshot", sep = "")
if (interactive()){
load_interactive_tour(data, pytourr_dir)
}

Modification of the 'render_proj' Function of 'tourr'

Description

Modification of the render_proj() function of tourr so that the half_range is calculated with max(sqrt(rowSums(data^2))) or can be provided as argument.

Usage

render_proj_inter(
  data,
  prj,
  half_range = NULL,
  axis_labels = NULL,
  obs_labels = NULL,
  limits = 1,
  position = "center"
)

Arguments

data

matrix, or data frame containing numeric columns, should be standardized to have mean 0, sd 1

prj

projection matrix

half_range

for scaling in the display, by default calculated from the data

axis_labels

of the axes to be displayed

obs_labels

labels of the observations to be available for interactive mouseover

limits

value setting the lower and upper limits of projected data, default 1

position

position of the axes: center (default), bottomleft or off

Value

list containing projected data, circle and segments for axes

Examples

library(tourr)
data("flea", package = "tourr")
flea_std <- apply(flea[,1:6], 2, function(x) (x-mean(x))/sd(x))
prj <- tourr::basis_random(ncol(flea[,1:6]), 2)
p <- render_proj_inter(flea_std, prj)

Risk Dataset

Description

Risk Dataset

Usage

risk

Format

A 563 x 6 array of Likert scale data

Details

Adult Australian residents that have undertaken at least one holiday in the last year, which involved staying away from home for at least four nights, were asked what risks they have taken in the past. The questions were about recreational, health, career, financial, safety and social risk. The response options were on the following: never (1), rarely (2), quite often (3), often (4) or very often (5)

Source

https://statistik.boku.ac.at/nachlass_leisch/MSA/

Examples

data(risk)
head(risk)

Austrian Vacation Activities Dataset

Description

Austrian Vacation Activities Dataset

Usage

winterActiv

Format

A 2961 x 27 array of binary responses

Details

The Austrian Vacation Activities dataset comprises responses from 2,961 adult tourists who spent their holiday in Austria during the 1997/98 season. The responses are coded in binary with 0 indicating that the tourist didn't partake in the activity and 1 indicating they did.

Source

https://statistik.boku.ac.at/nachlass_leisch/MSA/

Examples

data(winterActiv)
head(winterActiv)