Find big spenders

Identify which customers are likely to make large future purchases

This tutorial uses the Faraday API to identify which of your customers are likely to become "big spenders". You upload customer identifiers—we provide all of the rich consumer data necessary to build and employ predictive models. Open the above recipe for a list of API requests or keep reading for additional details.

📘

You can't accidentally incur charges

The steps in this guide are completely free. You won't be charged until you want to start retrieving predictions at scale.

Account & credentials

Create a free account if you haven't already. You will immediately get an API key that works for test data.

If you would like to use your own data, fill out this form. If you have questions about security, please see Security and privacy - we have handled hundreds of brands' PII since 2012 and will protect yours with the same controls. Note that there is strict isolation between accounts and your data will not be mixed in any way with other brands' data.

Prepare and send your data

You are ready to send some data over to Faraday. This is done by placing your data into a CSV file and sending it through the API.

📘

Sample data

Don't have access to customer data just yet? No problem — grab our sample data from the Testing page.

Make a CSV

For this tutorial, your data source should include information about your customers and their orders. You will need to format your data as a CSV. See Sending data to Faraday for examples and validation details.

To identify your customers, your CSV should include the columns:

  • customer ID
  • first name
  • last name
  • address
  • city
  • state

But you could also (or alternatively) include:

  • email
  • phone

Additionally, your CSV should include columns that describe your customers' orders:

  • total (the total amount of the order)
  • date (an order timestamp)

🚧

Include a header row

Your CSV file should have a "header" row, but you can use any headers you like. We suggest using recognizable headers that make sense to you.

Uploading your CSV

After preparing your CSV file, you are going to upload it using the API's upload endpoint.

Note that you will always upload your files to a subfolder underneath uploads. The below example uploads a local file named acme_orders.csv to a folder and file on Faraday at orders/file1.csv. You can pick whatever folder name and filename you want: we will use it in the next step. You can even upload multiple files with the same column structure into the same folder if that's easier — they'll all get merged together. This is especially useful if you want to update your model over time - for example, as new orders come in.

curl --request POST \
     --url https://api.faraday.ai/v1/uploads/orders/file1.csv \
     --header 'Accept: application/json' \
     --header 'Authorization: Bearer YOUR_API_KEY' \
     --header 'Content-Type: application/octet-stream' \
     --verbose \
     --data-binary "@acme_orders.csv"

Mapping your data

Once your file has finished uploading, Faraday needs to know how to understand it. You'll use datasets to define this mapping. Below is an example API call for the sample file.

curl --request POST \
     --url https://api.faraday.ai/v1/datasets \
     --header 'Accept: application/json' \
     --header 'Authorization: Bearer YOU_API_KEY' \
     --header 'Content-Type: application/json' \
     --data '
{
     "identity_sets": {
          "customer": {
               "house_number_and_street": [
                    "address"
               ],
               "person_first_name": "first_name",
               "person_last_name": "last_name",
               "city": "city",
               "state": "state"
          }
     },
     "output_to_streams": {
          "orders": {
               "data_map": {
                    "datetime": {
                         "column_name": "date",
                         "format": "date_iso8601"
                    },
                    "value": {
                         "column_name": "total",
                         "format": "currency_dollars"
                    }
               }
          }
    },
    "options": {
        "type": "hosted_csv",
        "upload_directory": "orders"
    }
}
'

Let's break down the above example.

  • upload_directory — Here you are telling Faraday which files we're talking about by specifying the subfolder you uploaded your data to, e.g. orders in our above example. If there are multiple files in this folder (and they all have the same structure), they will be merged together.
  • identity_sets — Here's where you specify how Faraday should recognize the people in each of your rows. Your data may have multiple identities per row, especially in lists of orders where you may have separate billing and shipping info. Our example above creates an arbitrary identity name customer. It uses name and address, but if you have emails or phone numbers it's important to include them to improve identity resolution. Faraday will always use the best combination of available identifiers to recognize people. Mapping options are available in Datasets.
  • output_to_streams — Here's where you tell Faraday how to recognize events in your data. Here, we're calling our events orders, because that's how many companies define their customers' transactional behavior, but you can use any name you like, and one dataset may represent multiple event types. We recommend (but do not require) that you specify a datetime field — the column date in the sample CSV. Additionally, since we have order totals from the sample CSV, we can also specify a value field—in this case, the column total. You can also include metadata about products involved in the event as well as a channel (e.g. lead source), although that's not necessary here.

📘

Mapping without datetime

Faraday also supports data that is not organized by events with timestamps. An example could be a list of leads. The datasets endpoint can handle the creation of a "dateless" dataset by removing the datetime key from the data_map block. In the above request, this would look like "data_map": {"value": {"column_name": "total","format": "currency_dollars"}}.

Create your cohorts

Now you're going to use this identity and event data to formally define groups of people. You will reference this specific group of people both when you build your outcome (model) and when you later want to generate predictions based on that model. We call these formal groups of people Cohorts which can be created from the cohorts endpoint. For this tutorial, you will need to create two cohorts.

Customers cohort

First, you want to include all the people in the customers dataset you created. To do this, you must point to the orders stream you created above and give your cohort a name like "customers." By default, when a cohort is specified from a stream, this captures the first date in the stream, which in this case is the first order.

curl --request POST \
     --url https://api.faraday.ai/v1/cohorts \
     --header "Authorization: Bearer YOUR_API_KEY" \
     --header "Content-Type: application/json" \
     --data '
{
     "name": "customers",
     "stream_name": "orders"
}
'

When this request succeeds, you'll get an ID for your customers cohort that you will need later in this tutorial (referred to as YOUR_CUSTOMERS_COHORT_ID in example requests).

Big spenders cohort

Next, you want to define a subset of your customers that have total sales exceeding, for example, $400. You can create this "big spenders" cohort with nearly the same request used to create the customers cohort. The additional min_value parameter defines big spenders as individuals with total orders exceeding the specified threshold (in this case $400).

curl --request POST \
     --url https://api.faraday.ai/v1/cohorts \
     --header "Authorization: Bearer YOUR_API_KEY" \
     --header "Content-Type: application/json" \
     --data '
{
     "name": "big spenders",
     "stream_name": "orders",
     "min_value": 400
}
'

When this request succeeds, you'll get an ID for your big spenders cohort that you will need later in this tutorial (referred to as YOUR_BIG_SPENDERS_COHORT_ID in example requests).

Build your propensity outcome

Now that you've formally defined your customer groups, it's time to move onto prediction. Faraday uses an abstraction called Outcome to configure propensity objectives. To define an outcome, you need to know:

  • Attainment cohort (required) — the group of people representing examples of people who have attained the outcome in the past.
  • Eligibility cohort (optional) — the group of people that are technically allowed to attain the outcome. If you don't specify, we'll assume all US adults are eligible.

For this tutorial, we're going to create an outcome as follows:

  • Attainment cohort: big spenders
  • Eligibility cohort: customers

You will take the cohort IDs returned in the previous step and use them to create a "big spenders" outcome:

curl --request POST \
     --url https://api.faraday.ai/v1/outcomes \
     --header 'Accept: application/json' \
     --header 'Authorization: Bearer YOUR_API_KEY' \
     --header 'Content-Type: application/json' \
     --data '
{
     "attainment_cohort_id": "YOUR_BIG_SPENDERS_COHORT_ID",
     "eligible_cohort_id": "YOUR_CUSTOMERS_COHORT_ID",
     "name": "big spenders"
}
'

When this request succeeds, you'll get an ID for your outcome that you will need later in this tutorial (referred to as YOUR_OUTCOME_ID in example requests).

📘

Conversion versus generation outcomes

The outcome you constructed is intended to predict the propensity of your customers to become big spenders, in other words, to predict a conversion event. If instead you want to find new customers that look like your big spenders, that can also be accomplished by building an outcome. To do this you need to remove the eligible_cohort_id from the above request. The resulting outcome predicts the propensity of a given US household to become a new big spender.

Outcomes with eligible_cohort_id omitted build, by default, a national US model using Faraday's responsibly sourced data on the US population.

Learn about your model (optional)

Once the model has finished building, you can look at the outcome model report.

curl --request GET \
     --url https://api.faraday.ai/v1/outcomes/YOUR_OUTCOME_ID/report.html \
     --header "Authorization: Bearer YOUR_API_KEY" \
     --header 'Accept: text/html'

🚧

Waiting for outcomes to build

Model reports are made available after an outcome has finished building. Requesting a report prior to this will result in an INTERNAL_SERVER_ERROR.

Generate propensity predictions

Finally, you can tell Faraday who you may want propensity predictions for, and then retrieve those results.

To do this, you will first create a Scope—this is how you tell Faraday which predictions you may want on which populations. You'll need three IDs from resources you created in this tutorial:

  1. YOUR_OUTCOME_ID (the big spenders outcome you created)
  2. YOUR_CUSTOMERS_COHORT_ID
  3. YOUR_BIG_SPENDERS_COHORT_ID
curl --request POST \
     --url https://api.faraday.ai/v1/scopes \
     --header 'Accept: application/json' \
     --header 'Authorization: Bearer YOUR_API_KEY' \
     --header 'Content-Type: application/json' \
     --data '
{
     "payload": {
          "outcome_ids": [
               "YOUR_OUTCOME_ID"
          ]
     },
     "population": {
          "cohort_ids": [
               "YOUR_CUSTOMERS_COHORT_ID"
          ],
          "exclusion_cohort_ids": [
               "YOUR_BIG_SPENDERS_COHORT_ID"
          ]
     },
     "name": "SCOPE_NAME"
}
'

Rather than defining a new cohort to get predictions for, you have instead specified the inclusion cohort_ids and the exclusion exclusion_cohort_ids in the population. It is a common pattern to include your eligible cohort (in this case customers) and exclude your attainment cohort (in this case big spenders).

When this request succeeds, you'll get an ID for your scope that you will need later in this tutorial (referred to as YOUR_SCOPE_ID in example requests).

📘

Demo scopes

When requesting a scope, you can explicitly set demo to true, although it's the default. This puts the scope in a preview mode to avoid billing charges—by limiting its output.

Deploying predictions

Now it's time to download the results! The simplest way to do this is to retrieve them all in a single CSV file.

Add a target

First you'll add a Target to your scope with publication type hosted_csv.

curl --request POST \
     --url https://api.faraday.ai/v1/targets \
     --header 'Accept: application/json' \
     --header 'Authorization: Bearer YOUR_API_KEY' \
     --header 'Content-Type: application/json' \
     --data '
{
     "options": {
          "type": "hosted_csv"
     },
     "name": "big spender scores",
     "scope_id": "YOUR_SCOPE_ID"
}
'

When this request succeeds, you'll get an ID for your target that you will need later in this tutorial (referred to as YOUR_TARGET_ID in example requests).

📘

Publication versus replication targets

A publication target e.g. "type": "hosted_csv" in the above options block, this means that Faraday hosts your predictions for retrieval. Alternatively, Faraday can also copy your predictions to systems that you control. These types of targets are called replication targets and require a connection.

Check deployment status

Prior to downloading your CSV check whether the resource (along with its dependencies) is ready:

curl --request GET \
     --url https://api.faraday.ai/v1/targets/YOUR_TARGET_ID \
     --header 'Accept: application/json' \
     --header 'Authorization: Bearer YOUR_API_KEY'

Retrieve your predictions

Once your deployment is ready, you can download the hosted CSV you created when you added your deploy target.

curl --request GET \
     --url https://api.faraday.ai/v1/targets/YOUR_TARGET_ID/download.csv \
     --header 'Accept: text/csv' \
     --header 'Authorization: Bearer YOUR_API_KEY'

Looking at the response, you'll see that each one of your customers (excluding big spenders) has a fdy_outcome_OUTCOME_ID_propensity_score and a fdy_outcome_OUTCOME_ID_propensity_percentile. The score is the raw output of the model and the percentile is computed with respect to the raw scores and the individuals defined in the scope.

These values measure the propensity of your customers to become big spenders and you can now take business actions based on these predictions!

In production, you'll generally automate the retrieval of this file and its insertion into your data warehouse and other systems. Faraday supports integration with a wide variety of tools.

🚧

Preview mode

Since the scope is in preview mode by default, you will only get a sample of the complete results back. This helps you validate the results you're getting and build your integrations before incurring charges.