Skip to contents

[Experimental] This function allows you to create a derived trait function using natural language. Note that LLMs can be unreliable, so the result should be verified manually. If the description is not clear, an error will be raised. Try to read the descriptions of built-in traits to get ideas. Currently, only prop(), ratio(), and wmean() are supported. To use this feature, you need to install the ellmer package. DeepSeek is used by default for backward compatibility. Other ellmer providers can be selected with provider, model, and provider-specific API key configuration.

Usage

make_trait(
  description,
  custom_mp = NULL,
  max_retries = 2,
  verbose = FALSE,
  provider = getOption("glydet.ai_provider", "deepseek"),
  model = getOption("glydet.ai_model", NULL),
  api_key = getOption("glydet.ai_api_key", NULL),
  base_url = getOption("glydet.ai_base_url", NULL)
)

Arguments

description

A description of the trait in natural language.

custom_mp

A named character vector of custom meta-properties. The names are the meta-property names, and the values are in the format "(type) description". For example: c(nE = "(integer) number of a2,6-linked sialic acids"). These custom meta-properties will be available for the LLM to use. Note that defining the meta-properties here is not enough for you to use them. You need to define corresponding meta-property functions or specifying meta-property columns. For more information about custom meta-properties, see the vignette Custom Meta-Properties.

max_retries

Maximum number of reflection retries when the AI-generated formula's explanation doesn't match the original description. Default is 2.

verbose

Whether to print verbose output. Default is FALSE. This is useful for inspecting how LLMs generate trait functions.

provider

AI provider passed to ellmer. One of "deepseek", "openai", "anthropic", "gemini", "openrouter", or "openai_compatible". "google_gemini" is accepted as an alias for "gemini". Defaults to getOption("glydet.ai_provider", "deepseek").

model

Model to use. Defaults to getOption("glydet.ai_model"), or "deepseek-chat" for DeepSeek and the provider default for other providers.

api_key

API key for the selected provider. If NULL, the provider specific environment variable is used. Defaults to getOption("glydet.ai_api_key").

base_url

Optional base URL for custom or OpenAI-compatible endpoints. Defaults to getOption("glydet.ai_base_url").

Value

A derived trait function.

Examples

# Sys.setenv(DEEPSEEK_API_KEY = "your_api_key")
# my_traits <- list(
#   nS = make_trait("the average number of sialic acids"),
#   nG = make_trait("the average number of galactoses")
# )

# The trait function can then be used in `derive_traits()`:
# derive_traits(exp, trait_fns = my_traits)