Sift inputs define the data that is sent into a Sift from external systems. They may be specified as webhooks, rpc, emails or slack-bot at the moment. A DAG may consume any combination of inputs. Input names must be unique in a DAG.

Example:

{
  "inputs": {
    "webhooks": {
      "#": "myWebhook below is now available as a bucket.",
      "myWebhook": {
        "inbound": {
          "uri": "{key}/{value}"
        }
      }
    },
    "rpc":{
      "simple_rpc":{
        "methods": ["GET"],
        "path": "/simple",
        "CORS":{}
      }
    },
    "emails": {
      "#1": "amazon below is now available as a bucket.",
      "amazon": {
        "filter": {
          "conditions": [
            {
              "from": {
                "regexp": {
                  "flags": "i",
                  "pattern": ".*@amazon\\.com"
                }
              }
            },
            {
              "date": "between now and 2015-06-01T00:00:00Z"
            },
            {
              "minSize": 100000
            },
            {
              "isUnread": true
            },
            {
              "header": [
                "Domainkey-Signature"
              ]
            }
          ]
        },
        "inMailbox": "inbox",
        "operator": "AND"
      },
      "#2": "paypal below is another bucket with a different filter.",
      "paypal": {
        "...": {}
      }
    },
    "slack-bot": {
      "_config": {
        "ambientMentionDuration": 300,
        "permission": "personal"
      },
      "#": "slackall is a bucket with all the DM's in your slack channel",
      "slackall": {
        "filter": {
          "conditions": [
            {
              "type": "message:direct_mention,direct_message"
            },
            {
              "text": {
                "regexp": {
                  "pattern": ".*",
                  "flags": "i"
                }
              }
            }
          ],
          "operator": "AND"
        }
      }
    }
  }
}

Webhooks Inputs

The only available type of webhooks is "inbound", as per example above, and the fields it supports are:

Field nameDescription
uriThe URI for the webhook specified using URI Templates.
More info on secure webhooks over HTTPS here.
jsonPathLets you assemble a JSON payload in the form.
responseAllows you to customise the Webhook response.

We have dedicated a section to explain the Webhook input implementation in more details in Webhooks.

Email Inputs

You can have any number of email ports, but each email port can have only one filter. The structure of a filter is an array of conditions and an operator that determines the relationship of those conditions i.e. (AND, OR).

A condition is formed by a property of the input type (e.g. the from field of an email in the example) which is evaluated according to its type:

  • for string fields it is going to be against a regular expression,
  • for date fields by comparing with a string of a date in format,
  • for email size by comparing with the number of bytes in RFC822 format,
  • for boolean fields against the given value and
  • for headers against the existence of the specified headers and their values.

🚧

Filter

If an email input port doesn’t have a filter defined no data will be returned.

Nesting of conditions can be achieved by adding objects in the conditions array following the same structure, i.e. {"conditions": [], "operator": ""}

The fields for conditions match the properties of an email as they are defined in the JMAP protocol:

FieldType and information
fromregexp - A json object of the form {"regexp": { "flags": "i", "pattern": ".*@amazon\\.com"}}. The flags specify the regular expression flags and the pattern is a regular expression to match against.
toregexp - same as from
ccregexp - same as from
bccregexp - same as from
subjectregexp - same as from
bodyregexp - same as from. Searches across textBody and htmlBody
textregexp - This is a shorthand for a search over all the above string fields. Same as from.
dateString - You can specify a date range here in the format “between Date|FreeDate and Date|FreeDate

where:
FreeDate = “N Duration before Date
N = Number
Duration = day(s)|week(s)|month(s)|year(s)
Date = “start|now|RFC3339 String”
start = shorthand for 00:00:00 UTC on 1 January 1970
now = shorthand for UTC time on the server when the filter is processed
RFC3339 String= String in RFC3339 format

IMPORTANT: Please note the use of before in all our dates. Since we are working with email archives all our references are before a particular date.
minSizeNumber - RFC822 size in bytes of an email
maxSizeNumber - RFC822 size in bytes of an email
isFlaggedBoolean
isUnreadBoolean
isAnsweredBoolean
isDraftBoolean
headerString

Additional fields at the same level as the filter block:

FieldType and information
inMailboxString - selects emails from the defined IMAP folder of the email account. Available options are "all", "inbox", "sent", "drafts" and "important". Defaults to "all" if the field is skipped.
wantsArray of Strings and/or Objects - opt-in for the following features: "extensions", "archive", "headers", "textBody", "htmlBody", "strippedHtmlBody", "attachments". Putting a feature name, e.g. "archive", into the array as String will enable this feature. The String name is a shortcut and can also be written more verbosely as an Object in this form { "feature": "featureName" }, e.g. { "feature": "archive" }. In contrast to the short String form the verbose form allows to configure a feature, if necessary. Currently only the "attachments" feature supports an optional configuration.

This is a list of supported features:

"archive" will allow you to process all older emails that match your filter.
"textBody", "htmlBody" and "strippedHtmlBody" fields will be populated when you receive our JMAP object.
"attachments" will allow you to access all attachments associated with the email. The attachment contents will be written to a large-storage folder with the name attachments by default. Using the Object* form to enable the "attachments" feature allows to set the name under which attachments are stored in the large-storage folder. E.g.: { "feature": "attachments", "options": { "bucket": "myBucket" }} will enable attachments and store them under "myBucket". Additionally, in the node that needs to process the attachments you should request a large-storage QoS for the attachments folder (see QoS for more information). The list of JMAP attachment fields available is outlined in the JMAP section.

Supported features in "wants"

wantsjmap attributesdescription
If wants is missing or is emptyid, threadId, mailboxIds, headers, subject, from, to, cc, bcc, replyTo, inReplyToMessageID, date, sizeIf wants is empty or not specified this will default to extensions, headers
headersheaders, subject, from, to, cc, bcc, replyTo, inReplyToMessageID, date, size, hasAttachmentThis returns all the headers contained in the RFC822 message
flagsisUnread, isFlagged, isAnswered, isDraftThis returns all the message flags
extensionsid, threadId, mailboxIdsThis returns imap extensions for the provider, in this case gmail.
previewpreviewThe textBody (if present) trimmed to 256 characters
textBodytextBodyThe text body.
htmlBodyhtmlBodyThe html body
strippedHtmlBodystrippedHtmlBodyThe stripped down version (tags and formatting removed) of the html body
attachmentsattachmentsThis will allow you to access all attachments associated with the email. The attachment contents will be written to a large-storage folder with the name attachments by default. Using the Object form to enable the "attachments" feature allows to set the name under which attachments are stored in the large-storage folder. E.g.: { "feature": "attachments", "options": { "bucket": "myBucket" }} will enable attachments and store them under "myBucket". Additionally, in the node that needs to process the attachments you should request a large-storage QoS for the attachments folder (see QoS for more information). The list of JMAP attachment fields available is outlined in the JMAP section.
archiveNo attributes as such but this allows processing of all older emails that match your filter.

📘

"wants" field

All the fields in wants are purely opt-in. If you do not specify the wants section then you will not receive any email body fields or be able to process older emails. However you will always be able to process new emails.

A feature in wants can be enabled in adding the feature's name as String to the array (short form) or in putting an Object into the array, which allows to set additional configuration for the feature, e.g., { "feature": "attachments", "options": { "bucket": "myBucket" }}. The short form and the Object form can be mixed.

Example of "wants" use:

"emails": {
  "port1": {
    "filter": {
      "conditions": [{
        "from": {
          "regexp": {
            "flags": "i",
            "pattern": ".*@amazon\\.com" }}
        },{
          "date": "between now and 2015-06-01T00:00:00Z",
        },{
          "minSize": 100000
        },{
          "isUnread": true
        },{
          "header":["Domainkey-Signature"]
        }],
      "operator": "AND"
    },
    "inMailbox": "inbox",
    "wants": ["archive", { 
      "feature": "attachments", 
      "options": {
        "bucket" "allAmazonAttachments"
      }
    }]
  }
}

Slack Bot Inputs

Another source of input that Sifts can interface with are bots. At the moment only the integration with Slack is complete but the infrastructure is there for more future integrations. Setting up an input from a bot is a bit more involved hence the extra field _config.

Available fields under _config:

FieldType and information
ambientMentionDurationnumber of seconds - (default: 300) Maximun time that a bot continues to receive ambient messages after its been @mentioned. Ambient messages are messages that the bot can listen to on a channel, but that do not mention the bot in any way
permissionstring - (default: "personal") available options "personal", "team" defines who is allowed to interact with the deployed bot

The rest of the structure is quite similar to email inputs. You can have any number of bot ports, but each bot port can have only one filter. The structure of a filter is an array of conditions and an operator that determines the relationship of those conditions i.e. (AND, OR).

Nesting of conditions can be achieved by adding objects in the conditions array following the same structure, i.e. {"conditions": [], "operator": ""}

The available conditions that can be used are:

FieldType and information
typestring - (default: matches everything ) can be 1 or more (comma separated) of "direct_mention", "direct_message", "ambient", "message", "mention"
textregexp - A json object of the form {"regexp": { "flags": "i", "pattern": ".*"}}. The flags specify the regular expression flags and the pattern is a regular expression to match against. Matches against any of the incoming messages.

RPC Inputs

Sifts that respond to triggers from this type of input can expose parts of their DAG under a REST API. The available options to define them are the following:

methodsarray - list of HTTP verbs to serve for the specified path
pathstring - allows you to specify the path for your endpoint. Can be static like /simple or dynamic like /static/+ for one extra segment and /static/* for multiple extra segments.
CORSobject - allows you to populate the following supported headers Origin, ExposeHeaders, MaxAge, AllowHeaders

Implementation notes for nodes

For a detailed overview of all the available fields you can have a look at the RPC section of the docs but here is a quick overview of what an rpc request will look like and how you can construct a response.

  • For nodes that will receive a rpc input in their implementation, the parsed value from got.in.data will have a structure similar to this:
{
 "remote_addr": "[::1]:56279",
 "method": "GET",
 "request_uri": "/edig/txt/redsift.com",
 "header": {
  "Accept": [
   "*/*"
  ],
  "User-Agent": [
   "curl/7.43.0"
  ]
 },
 "body": ""
}
  • For nodes that will output to the _rpc bucket a response for the rpc request will need to follow a structure similar to this:
{
  "name": "internal name for _rpc bucket", 
  "key": "key of incoming request", // e.g. got.in.data[0].key to match response
  "value":{
   "status_code": 200,
   "header": {
    "Content-Type": [
     "text/plain; charset=utf-8"
    ]
   }, // attention to header object is a map of string:[string]
   "body": "InNvbWV0aGluZyI=" //base64 encoded payload
  }
}