Registering datasets

Describing your data so Faraday can understand what it means

Raw CSV data isn't quite enough to make predictions. First, Faraday has to understand what your data means.

Registering a dataset

To do this, you'll use the POST /datasets endpoint. The main part of the request looks like this:

curl --request POST https://api.faraday.ai/v1/datasets \
     --header "Authorization: Bearer YOUR_API_KEY" \
     --header "Content-Type: application/json" \
     --data '
{
// options go here . . .
}
'

Below you'll find details on each of the three major options you need to provide to register a dataset.

Folder reference

As you learned in the previous step (Uploading raw data), you will upload one or more CSVs to a top-level folder within your inbox—this folder contains all the data for your new dataset.

For example, if you uploaded:

  • /shopify/initial_upload.csv
  • /shopify/2021-08-09.csv
  • /shopify/2021-08-10.csv

Then your upload directory is shopify.

{
    "options": {
      "type": "hosted_csv",
      "upload_directory": "shopify"
    },
    ...
}

Identities

Because your data is about people, Faraday needs to understand how to recognize these people. Typically, you will have columns in your data with information like names, addresses, emails, and phone numbers. This is your opportunity to tell Faraday that, for example, your fn column is a "First name."

In many cases, your data may include multiple identities per person; for example, shipping and billing addresses. That's why you specify your identities as an array. You can see the POST /datasets reference for more information.

{
  ...
    "identity_sets": {
        "customer": {
            "person_first_name": "fn"
            ...
        }
    },
  ...
}

Events

Finally, Faraday needs to understand what behaviors are being exhibited in your data. We call these events. For example, if you've uploaded order data, each row represents an order event. And each of these events was experienced by an individual person (see "Identities" above).

When mapping your data to events, you can indicate when the event happened (the datetime key), and the monetary value of the event (value). While all fields are optional, we recommend using datetime whenever possible, to improve the accuracy of the predictions.

{
  ...
    "output_to_streams": {
        "orders": {
            "data_map": {
                "datetime": {
                    "column_name": "updated_at",
                    "format": "date_iso8601"
                }
            }
        }
  ...
}

Putting it all together

curl --request POST https://api.faraday.ai/v1/datasets \
     --header "Authorization: Bearer YOUR_API_KEY" \
     --header "Content-Type: application/json" \
     --data '
{
    "name": "DATASET_NAME",
    "options": {
      "type": "hosted_csv",
      "upload_directory": "shopify"
    },
    "identity_sets": {
        "customer": {
            "email": "account_email"
        }
    },
    "output_to_streams": {
        "orders": {
            "data_map": {
                "datetime": {
                    "column_name": "updated_at",
                    "format": "date_iso8601"
                }
            }
        }
    }
}
'

What's next

Now use your datasets to identify important groups of people we call cohorts.