Sift inputs define the data that is sent into a Sift from external systems. They may be specified as webhooks, rpc, emails or slack-bot at the moment. A DAG may consume any combination of inputs. Input names must be unique in a DAG.

Example:

{
  "inputs": {
    "webhooks": {
      "#": "myWebhook below is now available as a bucket.",
      "myWebhook": {
        "inbound": {
          "uri": "{key}/{value}"
        }
      }
    },
    "rpc":{
      "simple_rpc":{
        "methods": ["GET"],
        "path": "/simple",
        "CORS":{}
      }
    },
    "emails": {
      "#1": "amazon below is now available as a bucket.",
      "amazon": {
        "filter": {
          "conditions": [
            {
              "from": {
                "regexp": {
                  "flags": "i",
                  "pattern": ".*@amazon\\.com"
                }
              }
            },
            {
              "date": "between now and 2015-06-01T00:00:00Z"
            },
            {
              "minSize": 100000
            },
            {
              "isUnread": true
            },
            {
              "header": [
                "Domainkey-Signature"
              ]
            }
          ]
        },
        "inMailbox": "inbox",
        "operator": "AND"
      },
      "#2": "paypal below is another bucket with a different filter.",
      "paypal": {
        "...": {}
      }
    },
    "slack-bot": {
      "_config": {
        "ambientMentionDuration": 300,
        "permission": "personal"
      },
      "#": "slackall is a bucket with all the DM's in your slack channel",
      "slackall": {
        "filter": {
          "conditions": [
            {
              "type": "message:direct_mention,direct_message"
            },
            {
              "text": {
                "regexp": {
                  "pattern": ".*",
                  "flags": "i"
                }
              }
            }
          ],
          "operator": "AND"
        }
      }
    }
  }
}

Webhooks Inputs

The only available type of webhooks is "inbound", as per example above, and the fields it supports are:

Field name
Description

uri

The URI for the webhook specified using URI Templates.
More info on secure webhooks over HTTPS here.

jsonPath

Lets you assemble a JSON payload in the form.

response

Allows you to customise the Webhook response.

We have dedicated a section to explain the Webhook input implementation in more details in Webhooks.

Email Inputs

You can have any number of email ports, but each email port can have only one filter. The structure of a filter is an array of conditions and an operator that determines the relationship of those conditions i.e. (AND, OR).

A condition is formed by a property of the input type (e.g. the from field of an email in the example) which is evaluated according to its type:

  • for string fields it is going to be against a regular expression,
  • for date fields by comparing with a string of a date in format,
  • for email size by comparing with the number of bytes in RFC822 format,
  • for boolean fields against the given value and
  • for headers against the existence of the specified headers and their values.

Filter

If an email input port doesn’t have a filter defined no data will be returned.

Nesting of conditions can be achieved by adding objects in the conditions array following the same structure, i.e. {"conditions": [], "operator": ""}

The fields for conditions match the properties of an email as they are defined in the JMAP protocol:

Field
Type and information

from

regexp - A json object of the form {"regexp": { "flags": "i", "pattern": ".*@amazon\\.com"}}. The flags specify the regular expression flags and the pattern is a regular expression to match against.

to

regexp - same as from

cc

regexp - same as from

bcc

regexp - same as from

subject

regexp - same as from

body

regexp - same as from. Searches across textBody and htmlBody

text

regexp - This is a shorthand for a search over all the above string fields. Same as from.

date

String - You can specify a date range here in the format “between Date|FreeDate and Date|FreeDate

where:
FreeDate = “N Duration before Date
N = Number
Duration = day(s)|week(s)|month(s)|year(s)
Date = “start|now|RFC3339 String”
start = shorthand for 00:00:00 UTC on 1 January 1970
now = shorthand for UTC time on the server when the filter is processed
RFC3339 String= String in RFC3339 format

IMPORTANT: Please note the use of before in all our dates. Since we are working with email archives all our references are before a particular date.

minSize

Number - RFC822 size in bytes of an email

maxSize

Number - RFC822 size in bytes of an email

isFlagged

Boolean

isUnread

Boolean

isAnswered

Boolean

isDraft

Boolean

header

String

Additional fields at the same level as the filter block:

Field
Type and information

inMailbox

String - selects emails from the defined IMAP folder of the email account. Available options are "all", "inbox", "sent", "drafts" and "important". Defaults to "all" if the field is skipped.

wants

Array of Strings and/or Objects - opt-in for the following features: "archive", "textBody", "htmlBody", "strippedHtmlBody", "attachments". Putting a feature name, e.g. "archive", into the array as String will enable this feature. The String name is a shortcut and can also be written more verbosely as an Object in this form { "feature": "featureName" }, e.g. { "feature": "archive" }. In contrast to the short String form the verbose form allows to configure a feature, if necessary. Currently only the "attachments" feature supports an optional configuration.

This is a list of supported features:

  • "archive" will allow you to process all older emails that match your filter.
  • "textBody", "htmlBody" and "strippedHtmlBody" fields will be populated when you receive our JMAP object.
  • "attachments" will allow you to access all attachments associated with the email. The attachment contents will be written to a large-storage folder with the name attachments by default. Using the Object form to enable the "attachments" feature allows to set the name under which attachments are stored in the large-storage folder. E.g.: { "feature": "attachments", "options": { "bucket": "myBucket" }} will enable attachments and store them under "myBucket". Additionally, in the node that needs to process the attachments you should request a large-storage QoS for the attachments folder (see QoS for more information). The list of JMAP attachment fields available is outlined in the JMAP section.

"wants" field

All the fields in wants are purely opt-in. If you do not specify the wants section then you will not receive any email body fields or be able to process older emails. However you will always be able to process new emails.

A feature in wants can be enabled in adding the feature's name as String to the array (short form) or in putting an Object into the array, which allows to set additional configuration for the feature, e.g., { "feature": "attachments", "options": { "bucket": "myBucket" }}. The short form and the Object form can be mixed.

Example of "wants" use:

"emails": {
  "port1": {
    "filter": {
      "conditions": [{
        "from": {
          "regexp": {
            "flags": "i",
            "pattern": ".*@amazon\\.com" }}
        },{
          "date": "between now and 2015-06-01T00:00:00Z",
        },{
          "minSize": 100000
        },{
          "isUnread": true
        },{
          "header":["Domainkey-Signature"]
        }],
      "operator": "AND"
    },
    "inMailbox": "inbox",
    "wants": ["archive", { 
      "feature": "attachments", 
      "options": {
        "bucket" "allAmazonAttachments"
      }
    }]
  }
}

Slack Bot Inputs

Another source of input that Sifts can interface with are bots. At the moment only the integration with Slack is complete but the infrastructure is there for more future integrations. Setting up an input from a bot is a bit more involved hence the extra field _config.

Available fields under _config:

Field
Type and information

ambientMentionDuration

number of seconds - (default: 300) Maximun time that a bot continues to receive ambient messages after its been @mentioned. Ambient messages are messages that the bot can listen to on a channel, but that do not mention the bot in any way

permission

string - (default: "personal") available options "personal", "team" defines who is allowed to interact with the deployed bot

The rest of the structure is quite similar to email inputs. You can have any number of bot ports, but each bot port can have only one filter. The structure of a filter is an array of conditions and an operator that determines the relationship of those conditions i.e. (AND, OR).

Nesting of conditions can be achieved by adding objects in the conditions array following the same structure, i.e. {"conditions": [], "operator": ""}

The available conditions that can be used are:

Field
Type and information

type

string - (default: matches everything ) can be 1 or more (comma separated) of "direct_mention", "direct_message", "ambient", "message", "mention"

text

regexp - A json object of the form {"regexp": { "flags": "i", "pattern": ".*"}}. The flags specify the regular expression flags and the pattern is a regular expression to match against. Matches against any of the incoming messages.

RPC Inputs

Sifts that respond to triggers from this type of input can expose parts of their DAG under a REST API. The available options to define them are the following:

methods

array - list of HTTP verbs to serve for the specified path

path

string - allows you to specify the path for your endpoint. Can be static like /simple or dynamic like /static/+ for one extra segment and /static/* for multiple extra segments.

CORS

object - allows you to populate the following supported headers Origin, ExposeHeaders, MaxAge, AllowHeaders

Implementation notes for nodes

For a detailed overview of all the available fields you can have a look at the RPC section of the docs but here is a quick overview of what an rpc request will look like and how you can construct a response.

  • For nodes that will receive a rpc input in their implementation, the parsed value from got.in.data will have a structure similar to this:
{
 "remote_addr": "[::1]:56279",
 "method": "GET",
 "request_uri": "/edig/txt/redsift.com",
 "header": {
  "Accept": [
   "*/*"
  ],
  "User-Agent": [
   "curl/7.43.0"
  ]
 },
 "body": ""
}
  • For nodes that will output to the _rpc bucket a response for the rpc request will need to follow a structure similar to this:
{
  "name": "internal name for _rpc bucket", 
  "key": "key of incoming request", // e.g. got.in.data[0].key to match response
  "value":{
   "status_code": 200,
   "header": {
    "Content-Type": [
     "text/plain; charset=utf-8"
    ]
   }, // attention to header object is a map of string:[string]
   "body": "InNvbWV0aGluZyI=" //base64 encoded payload
  }
}

Inputs