Configure Regression Model

Trains, saves, loads, and unloads a regression model. Trains a model based on embedding vectors for text data and numeric target data.

Options

Action specifies the operation to perform:
- Train: Creates and trains a new regression model using the specified training data and algorithm parameters.
- Save: Saves a trained model as container data to the field or variable specified by the Save Model To option for later use.
- Load: Loads a previously saved model from the container field or variable specified by the Save Model To option into memory.
- Unload: Removes a model from memory to free up resources.
Model Name is a text expression for the unique name of the regression model that the Action option operates on. After training or loading a model, use the same name when referencing the model to perform regression with the PredictFromModel function.

Options available only when Action is Train:

Algorithm selects the machine learning algorithm to use. In version 22.0, Random Forest is the only algorithm provided.
Training Vectors Field specifies the text or container field that contains vectors for the training data.
Training Target Field specifies the field containing the target values (the numerical values you want to predict). Each value corresponds to one training example in Training Vectors Field.
Skip empty or invalid records skips records for which the field specified by Training Vectors Field or Training Target Field is empty or contains invalid data. If not selected, the script step skips all records after the first record with empty or invalid field data and returns an error.
Parameters is a text expression for a JSON object that specifies algorithm-specific parameters as key-value pairs. See Description.

Options available only when Action is Train or Save:

Save Model To specifies a variable or container field to save the trained model to.

Options available only when Action is Load:

Load Model From specifies the variable or container field to load a trained model from.

Compatibility

Product	Supported
FileMaker Pro	Yes
FileMaker Go	Yes
FileMaker WebDirect	Yes
FileMaker Server	Yes
FileMaker Cloud	Yes
FileMaker Data API	Yes
Custom Web Publishing	Yes

Originated in version

22.0

Description

This script step enables you to train and manage regression models that use machine learning algorithms directly within your FileMaker Pro app. Regression models predict continuous numerical values for a dependent output variable (target) based on independent input variables (features), making these models suitable for forecasting, trend analysis, and data-driven decision making.

The Random Forest algorithm is an ensemble learning method that combines multiple decision trees to create more robust and accurate predictions than a single decision tree, making this algorithm suitable for real-world data. Each tree in the forest is trained on a random subset of the training data and features, which helps prevent overfitting (capturing not only patterns from the training data but also random noise) and improves generalization to new data.

When you select the Train action, the script step performs the following operations:

Parses vectors in Training Vectors Field to extract feature values.
Applies the specified algorithm Parameters or uses defaults if none are provided.
Trains the model using the Random Forest algorithm with feature values extracted from Training Vectors Field and the target values from Training Target Field.
Stores the trained model in memory with the specified Model Name for use with the PredictFromModel function.
If Save Model To is specified, saves the model for later use.

The Save, Load, and Unload actions let you manage your trained models, saving them only when needed and, to optimize performance, keeping them in memory only while they're being used.

For the Random Forest algorithm, you can use the following keys and values in the Parameters option to adjust training, if needed. If a key isn't specified or the Parameters option isn't used, the script step uses the default values.

Parameter	Description	Default value
`numTrees`	Number of decision trees in the random forest. More trees generally improve accuracy but increase training time and memory usage.	10
`maxDepth`	Maximum depth of each decision tree. The tree may not reach this depth when training. Deeper trees can capture more complex patterns but may overfit to training data.	10
`minSamplesSplit`	Minimum number of samples required to split an internal node. Higher values can prevent overfitting.	2
`numFeatures`	If positive, the total number of possible features to use for training a single tree (valid range is 1 ≤ numFeatures < 1536). If negative, uses all features for training.	-1
`maxFeatures`	Maximum number of features to be used for training a single tree: 0: `numFeatures` 1: Sqrt(`numFeatures`) 2: log₂(`numFeatures`)	1

For example, this JSON object sets all the keys in the Parameters option:

Copy

{
    "numTrees" : 15,
    "maxDepth" : 15,
    "minSamplesSplit" : 3,
    "numFeatures" : 1000,
    "maxFeatures" : 0
}

Notes

Values in Training Vectors Field must be provided as valid JSON arrays in a text field or the equivalent binary data in a container field. Each array must contain the same number of elements (features) across all records.

You can use the Insert Embedding in Found Set script step to generate text embedding vectors in Training Vectors Field based on another field that contains your input data. The embedding vectors generated by Insert Embedding in Found Set meet the above requirements when using a supported text embedding model.
Values in Training Target Field must be numerical. Non-numerical values will cause the training to fail.
Model names are case-sensitive and must be unique within the current FileMaker session. If a model with the same name already exists when training, it will be replaced with the new model.
To test the quality of your trained model, use the PredictFromModel function to return the predicted value given the same vector data you used to train the model. Then compare the predicted values to the target values used during training. One method of measuring the overall quality of the model is mean squared error (MSE), which calculates the average squared difference between the predicted values and the target values. An MSE value of zero indicates a perfect fit between the model's predictions and actual values, so lower MSE values are better. The best MSE value for a model is relative to the scale of your target variable and the acceptable error margin for your specific application. See Example 3 for a script that calculates MSE.
Performance considerations:
- Training time increases with the number of trees (numTrees) and maximum depth (maxDepth). Start with the default values for Parameters and adjust them based on your accuracy requirements and performance constraints.
- Larger data sets require more memory during training. Consider using a representative sample for initial model training if working with very large data sets.
- Models remain in memory until explicitly unloaded or the FileMaker session ends. Use the Unload action to free up memory when models are no longer needed.

Example 1 - Train a basic model

Trains a basic regression model to predict house prices based on square footage, number of bedrooms, and age of the house using the default Random Forest parameters.

In the Properties table, the Features field contains JSON arrays like [1200, 3, 15] representing square footage, bedrooms, and age, while the Price field contains the corresponding house price. After this script step runs, the model is in memory and can be used in the current FileMaker session by the PredictFromModel function by referencing the model name "HousePriceModel."

Because the model hasn't been saved in this example, the model won't be available after the current session ends.

Copy

Go to Layout [ "Properties" (Properties) ; Animation: None ]

Configure Regression Model [ Action: Train ; Model Name: "HousePriceModel" ; Algorithm: Random Forest ; Training Vectors Field: Properties::Features ; Training Target Field: Properties::Price ; Skip empty or invalid records ]

Example 2 - Get embedding vectors and train a model

Trains a regression model to predict a star rating (1 to 5) based on the text of a customer's review.

The training data is in the Reviews table and consists of ReviewText (a text field containing the customer's review) and Rating (a number field containing the star rating that the customer chose). After the script configures an AI account for the AI Model Server installed with FileMaker Server, it uses that account to insert text embedding vectors in the ReviewEmbedding container field based on data in the ReviewText field.

Then the script trains the regression model, naming it "ReviewModel" and using the training vectors in ReviewEmbedding, the target values in the Rating field, and custom parameters. When training is complete, the model is saved in the global container field ReviewModel.

Copy

Configure AI Account [ Account Name: "AI_Model_Server" ; Model Provider: Custom ; Endpoint: "https://myserver.example.com:8080/" ; Verify SSL Certificates ; API key: Global::API_Key ]

Go to Layout [ "Reviews" (Reviews) ]

Insert Embedding in Found Set [ Account Name: "AI_Model_Server" ; Embedding Model: "all-MiniLM-L12-v2" ; Source Field: Reviews::ReviewText ; Target Field: Reviews::ReviewEmbedding ; Continue on error ; Show summary ]

Set Variable [ $parameters ; Value: 
  Let ( [
    json = "{}" ;
    json = JSONSetElement ( json; "numTrees"; 15; JSONNumber ) ;
    json = JSONSetElement ( json; "maxDepth"; 15; JSONNumber ) ;
    json = JSONSetElement ( json; "minSamplesSplit"; 3; JSONNumber ) ;
    json = JSONSetElement ( json; "numFeatures"; 1000; JSONNumber ) ;
    json = JSONSetElement ( json; "maxFeatures"; 0; JSONNumber )
  ] ;
    json
  )
]

Configure Regression Model [ Action: Train Model ; Model Name: "ReviewModel" ; Algorithm: Random Forest ; Training Vectors Field: Reviews::ReviewEmbedding ; Training Target Field: Reviews::Rating ; Skip empty or invalid records ; Parameters: $parameters ; Save Model To: Reviews::ReviewModel ]

The model named "ReviewModel" is ready to use. See Example 2 in PredictFromModel.

Example 3 - Calculate mean squared error

Calculates the mean squared error to assess the quality of a trained regression model (see Notes).

The script goes to the Reviews layout containing the training data used in Example 2, shows all records, zeroes out the $squaredLossSum variable, goes to the first record, then loads the previously saved model and names it "ReviewModel."

In the loop, for each record, the difference between the value predicted by the model using the PredictFromModel function and the actual value in the Rating field is squared and added to $squaredLossSum.

After looping through all records, the MSE value is calculated by dividing $squaredLossSum by the number of samples (the number of records in the training data set) and displayed in a dialog. When done, the script unloads the model from memory.

Copy

Go to Layout [ "Reviews" (Reviews) ]
Show All Records
Set Variable [ $squaredLossSum; Value: 0 ]

Configure Regression Model [ Action: Load Model ; Model Name: "ReviewModel" ; Load Model From: Reviews::ReviewModel ]

Go to Record/Request/Page [ First ]
Loop [ Flush: Always ]
    Set Variable [ $loss; Value: Reviews::Rating - PredictFromModel ( "ReviewModel" ; Reviews::ReviewEmbedding ) ]
    Set Variable [ $squaredLoss ; Value: $loss^2 ]
    Set Variable [ $squaredLossSum ; Value: $squaredLossSum + $squaredLoss ]
    Go to Record/Request/Page [ Next ; Exit after last: On ]
End Loop

Show Custom Dialog [ "MSE Value" ; $squaredLossSum / Get ( FoundCount ) ]

Configure Regression Model [ Action: Unload Model ; Model Name: "ReviewModel" ]

A possible MSE value is .01875826440712939518.

Configure Regression Model

See also

Options

Compatibility

Originated in version

Description

Notes

Example 1 - Train a basic model

Example 2 - Get embedding vectors and train a model

Example 3 - Calculate mean squared error

Related topics