Inputs
Sift inputs define the data that is sent into a Sift from external systems. They may be specified as webhooks
, rpc
, emails
or slack-bot
at the moment. A DAG may consume any combination of inputs. Input names must be unique in a DAG.
Example:
{
"inputs": {
"webhooks": {
"#": "myWebhook below is now available as a bucket.",
"myWebhook": {
"inbound": {
"uri": "{key}/{value}"
}
}
},
"rpc":{
"simple_rpc":{
"methods": ["GET"],
"path": "/simple",
"CORS":{}
}
},
"emails": {
"#1": "amazon below is now available as a bucket.",
"amazon": {
"filter": {
"conditions": [
{
"from": {
"regexp": {
"flags": "i",
"pattern": ".*@amazon\\.com"
}
}
},
{
"date": "between now and 2015-06-01T00:00:00Z"
},
{
"minSize": 100000
},
{
"isUnread": true
},
{
"header": [
"Domainkey-Signature"
]
}
]
},
"inMailbox": "inbox",
"operator": "AND"
},
"#2": "paypal below is another bucket with a different filter.",
"paypal": {
"...": {}
}
},
"slack-bot": {
"_config": {
"ambientMentionDuration": 300,
"permission": "personal"
},
"#": "slackall is a bucket with all the DM's in your slack channel",
"slackall": {
"filter": {
"conditions": [
{
"type": "message:direct_mention,direct_message"
},
{
"text": {
"regexp": {
"pattern": ".*",
"flags": "i"
}
}
}
],
"operator": "AND"
}
}
}
}
}
Webhooks Inputs
The only available type of webhooks is "inbound", as per example above, and the fields it supports are:
Field name | Description |
---|---|
uri | The URI for the webhook specified using URI Templates. More info on secure webhooks over HTTPS here. |
jsonPath | Lets you assemble a JSON payload in the form. |
response | Allows you to customise the Webhook response. |
We have dedicated a section to explain the Webhook input implementation in more details in Webhooks.
Email Inputs
You can have any number of email ports, but each email port can have only one filter. The structure of a filter is an array of conditions
and an operator
that determines the relationship of those conditions i.e. (AND
, OR
).
A condition
is formed by a property of the input type (e.g. the from
field of an email in the example) which is evaluated according to its type:
- for string fields it is going to be against a regular expression,
- for date fields by comparing with a string of a date in format,
- for email size by comparing with the number of bytes in RFC822 format,
- for boolean fields against the given value and
- for headers against the existence of the specified headers and their values.
Filter
If an email input port doesn’t have a filter defined no data will be returned.
Nesting of conditions can be achieved by adding objects in the conditions
array following the same structure, i.e. {"conditions": [], "operator": ""}
The fields for conditions
match the properties of an email as they are defined in the JMAP protocol:
Field | Type and information |
---|---|
from | regexp - A json object of the form {"regexp": { "flags": "i", "pattern": ".*@amazon\\.com"}} . The flags specify the regular expression flags and the pattern is a regular expression to match against. |
to | regexp - same as from |
cc | regexp - same as from |
bcc | regexp - same as from |
subject | regexp - same as from |
body | regexp - same as from . Searches across textBody and htmlBody |
text | regexp - This is a shorthand for a search over all the above string fields. Same as from . |
date | String - You can specify a date range here in the format “between Date|FreeDate and Date|FreeDate”where: FreeDate = “ N Duration before Date ”N = Number Duration = day(s)|week(s)|month(s)|year(s) Date = “start|now|RFC3339 String” start = shorthand for 00:00:00 UTC on 1 January 1970 now = shorthand for UTC time on the server when the filter is processed RFC3339 String= String in RFC3339 format IMPORTANT: Please note the use of before in all our dates. Since we are working with email archives all our references are before a particular date. |
minSize | Number - RFC822 size in bytes of an email |
maxSize | Number - RFC822 size in bytes of an email |
isFlagged | Boolean |
isUnread | Boolean |
isAnswered | Boolean |
isDraft | Boolean |
header | String |
Additional fields at the same level as the filter
block:
Field | Type and information |
---|---|
inMailbox | String - selects emails from the defined IMAP folder of the email account. Available options are "all", "inbox", "sent", "drafts" and "important". Defaults to "all" if the field is skipped. |
wants | Array of Strings and/or Objects - opt-in for the following features: "extensions", "archive", "headers", "textBody", "htmlBody", "strippedHtmlBody", "attachments". Putting a feature name, e.g. "archive", into the array as String will enable this feature. The String name is a shortcut and can also be written more verbosely as an Object in this form { "feature": "featureName" } , e.g. { "feature": "archive" } . In contrast to the short String form the verbose form allows to configure a feature, if necessary. Currently only the "attachments" feature supports an optional configuration.This is a list of supported features: "archive" will allow you to process all older emails that match your filter. "textBody", "htmlBody" and "strippedHtmlBody" fields will be populated when you receive our JMAP object. "attachments" will allow you to access all attachments associated with the email. The attachment contents will be written to a large-storage folder with the name attachments by default. Using the Object* form to enable the "attachments" feature allows to set the name under which attachments are stored in the large-storage folder. E.g.: { "feature": "attachments", "options": { "bucket": "myBucket" }} will enable attachments and store them under "myBucket". Additionally, in the node that needs to process the attachments you should request a large-storage QoS for the attachments folder (see QoS for more information). The list of JMAP attachment fields available is outlined in the JMAP section. |
Supported features in "wants"
wants | jmap attributes | description |
---|---|---|
If wants is missing or is empty | id, threadId, mailboxIds, headers, subject, from, to, cc, bcc, replyTo, inReplyToMessageID, date, size | If wants is empty or not specified this will default to extensions, headers |
headers | headers, subject, from, to, cc, bcc, replyTo, inReplyToMessageID, date, size, hasAttachment | This returns all the headers contained in the RFC822 message |
flags | isUnread, isFlagged, isAnswered, isDraft | This returns all the message flags |
extensions | id, threadId, mailboxIds | This returns imap extensions for the provider, in this case gmail. |
preview | preview | The textBody (if present) trimmed to 256 characters |
textBody | textBody | The text body. |
htmlBody | htmlBody | The html body |
strippedHtmlBody | strippedHtmlBody | The stripped down version (tags and formatting removed) of the html body |
attachments | attachments | This will allow you to access all attachments associated with the email. The attachment contents will be written to a large-storage folder with the name attachments by default. Using the Object form to enable the "attachments" feature allows to set the name under which attachments are stored in the large-storage folder. E.g.: { "feature": "attachments", "options": { "bucket": "myBucket" }} will enable attachments and store them under "myBucket". Additionally, in the node that needs to process the attachments you should request a large-storage QoS for the attachments folder (see QoS for more information). The list of JMAP attachment fields available is outlined in the JMAP section. |
archive | No attributes as such but this allows processing of all older emails that match your filter. |
"wants" field
All the fields in
wants
are purely opt-in. If you do not specify the wants section then you will not receive any email body fields or be able to process older emails. However you will always be able to process new emails.A feature in
wants
can be enabled in adding the feature's name as String to the array (short form) or in putting an Object into the array, which allows to set additional configuration for the feature, e.g.,{ "feature": "attachments", "options": { "bucket": "myBucket" }}
. The short form and the Object form can be mixed.
Example of "wants" use:
"emails": {
"port1": {
"filter": {
"conditions": [{
"from": {
"regexp": {
"flags": "i",
"pattern": ".*@amazon\\.com" }}
},{
"date": "between now and 2015-06-01T00:00:00Z",
},{
"minSize": 100000
},{
"isUnread": true
},{
"header":["Domainkey-Signature"]
}],
"operator": "AND"
},
"inMailbox": "inbox",
"wants": ["archive", {
"feature": "attachments",
"options": {
"bucket" "allAmazonAttachments"
}
}]
}
}
Slack Bot Inputs
Another source of input that Sifts can interface with are bots. At the moment only the integration with Slack is complete but the infrastructure is there for more future integrations. Setting up an input from a bot is a bit more involved hence the extra field _config
.
Available fields under _config
:
Field | Type and information |
---|---|
ambientMentionDuration | number of seconds - (default: 300) Maximun time that a bot continues to receive ambient messages after its been @mentioned. Ambient messages are messages that the bot can listen to on a channel, but that do not mention the bot in any way |
permission | string - (default: "personal" ) available options "personal" , "team" defines who is allowed to interact with the deployed bot |
The rest of the structure is quite similar to email inputs. You can have any number of bot ports, but each bot port can have only one filter. The structure of a filter is an array of conditions
and an operator
that determines the relationship of those conditions i.e. (AND
, OR
).
Nesting of conditions can be achieved by adding objects in the conditions
array following the same structure, i.e. {"conditions": [], "operator": ""}
The available conditions
that can be used are:
Field | Type and information |
---|---|
type | string - (default: matches everything ) can be 1 or more (comma separated) of "direct_mention" , "direct_message" , "ambient" , "message" , "mention" |
text | regexp - A json object of the form {"regexp": { "flags": "i", "pattern": ".*"}} . The flags specify the regular expression flags and the pattern is a regular expression to match against. Matches against any of the incoming messages. |
RPC Inputs
Sifts that respond to triggers from this type of input can expose parts of their DAG under a REST API. The available options to define them are the following:
methods | array - list of HTTP verbs to serve for the specified path |
path | string - allows you to specify the path for your endpoint. Can be static like /simple or dynamic like /static/+ for one extra segment and /static/* for multiple extra segments. |
CORS | object - allows you to populate the following supported headers Origin , ExposeHeaders , MaxAge , AllowHeaders |
Implementation notes for nodes
For a detailed overview of all the available fields you can have a look at the RPC section of the docs but here is a quick overview of what an rpc request will look like and how you can construct a response.
- For nodes that will receive a
rpc
input in their implementation, the parsed value fromgot.in.data
will have a structure similar to this:
{
"remote_addr": "[::1]:56279",
"method": "GET",
"request_uri": "/edig/txt/redsift.com",
"header": {
"Accept": [
"*/*"
],
"User-Agent": [
"curl/7.43.0"
]
},
"body": ""
}
- For nodes that will output to the
_rpc
bucket a response for the rpc request will need to follow a structure similar to this:
{
"name": "internal name for _rpc bucket",
"key": "key of incoming request", // e.g. got.in.data[0].key to match response
"value":{
"status_code": 200,
"header": {
"Content-Type": [
"text/plain; charset=utf-8"
]
}, // attention to header object is a map of string:[string]
"body": "InNvbWV0aGluZyI=" //base64 encoded payload
}
}
Updated over 6 years ago