Uploading raw data

Getting CSVs of customer data to Faraday

First, you'll send raw customer data to Faraday.

Sending data to Faraday

You will send data to Faraday using CSV files. This is the "lingua franca" of data on the internet and the simplest, easiest way for developers to exchange machine-readable data.

📘

Don't have any data?

If you don't have any data, or you want to start with sample data, please see the Testing page to download a CSV.

Structuring your data

Faraday is optimized to work with event data. For example, we prefer a file containing all of your orders over a file containing all of your customers.

Data folders

Files are uploaded to "data folders" which are only accessible to your account.

  • Each folder should contain a unique type of data, and all files in a folder should share the same column structure. For example, if you have orders from an e-commerce platform—spanning one or more distinct files—those should all be uploaded to a single folder, perhaps called web_orders.
  • Let's say you have a separate orders database from a point-of-sale or phone system. Files from that system should go into a separate folder, such as phone_orders.
  • Now, regardless of how they bought, they're all your customers! Faraday lets you merge all this data together by specifying the same stream in the datasets mapping for each folder.

Incremental updates

Faraday is optimized to accept an ongoing sequence of unique, incremental files—perhaps daily—in your data folders. But we do support multiple ways to specify the uniqueness and recency of rows in the dataset mapping.

Upsert columns

In some cases a certain event such as an order could be "updated" in a subsequent file. It's important that Faraday not treat the "update" as a new event! To accomplish this, you can specify one or more columns that uniquely identify a row in upsert_columns. This could be an explicit "order ID" column, or it can be a natural composite key such as a combination of customer ID and timestamp. Using upsert_columns is the preferred way to specify uniqueness in your data set.

As a bonus, we will return these columns to you when you retrieve predictions so that joining the data back is straightforward!

Incremental column

If you cannot provide a unique identifier with upsert_columns, the next alternative is to use incremental_column—which must refer to a datetime column. The incremental column will be used to ignore any rows older than the most recent record already ingested. This option is only useful if you 1) don't have a unique ID per row and 2) have to send over your entire data set each time you push to Faraday.

Every row unique by default

If you cannot use upsert columns or you can guarantee you don't upload duplicate rows, we will simply treat every row as a unique entry.

Format

The data you send Faraday should be in standard CSV format with column headers. You will need a minimum of two types of columns:

  1. Timestamp column(s) that represents a point in time when the event described by the row occurred.
  2. Identity column(s) that identify people, such as by email, name, and/or address. See datasets for the full list of available identity fields. The more identity columns you include, the better Faraday's identity resolution will work.

Other metadata

Beyond these minimum requirements, you can also specify products associated with the event and/or a dollar value associated with the event. Some prediction types may require these metadata.

📘

Column names

The exact names of the fields is up to you. You will map your column names to their intended purpose using the Dataset resource.

Example

Here's an example orders file:

📂 Sample order data

curl --request POST https://api.faraday.ai/v1/datasets \
     --header "Authorization: Bearer YOUR_API_TOKEN" \
     --header "Content-Type: application/json" \
     --verbose \
     --data '
{
  "options": {
    "type": "hosted_csv",
    "upload_directory": "orders"
  },
  "identity_sets": {
    "customer": {
      "person_first_name": "first_name",
      "person_last_name": "last_name",
      "house_number_and_street": ["address"],
      "city": "city",
      "state": "state"
    }
  },
  "output_to_streams": {
    "orders": {
      "data_map": {
        "datetime": {
          "column_name": "date",
          "format": "date_iso8601"
        },
        "value": {
          "column_name": "total"
        }
      }
    }
  }
} 
'

What's next

Now you'll describe your data by registering a dataset