Page tree
Skip to end of metadata
Go to start of metadata

This is an org-level, subscription feature.

Ultra Tasks let a pipeline, known as an Ultra Pipeline, continuously consume documents from external sources. In this mode, pipelines can consume documents from sources that are not compatible with triggered tasks or require low-latency processing of their inputs. The task can manage multiple instances (runs) of a pipeline and they can all simultaneously consume documents so you get some load-balancing/reliability.


Plain HTTP/S requests can be fed into a pipeline through a FeedMaster that is installed as part of a Snaplex. The HTTP request will be turned into a document that is sent to the pipeline's unlinked input and the document sent to an unlinked output will be turned into the HTTP response to the original request. 

See Snaplex Installation on Linux for FeedMaster installation on a Groundplex and Requirements for On-premises Snaplex for specific configurations needed for FeedMaster.


It is possible to have more than one FeedMaster configured. In that configuration, you would likely want to put a load balancer in front of the FeedMasters so that the requests are automatically distributed. The /healthz URL can be used by the load balancer to check for the health status of the FeedMaster.


Building Pipelines for Ultra Tasks

When building pipelines for use with Ultra Tasks, the pipeline can either have:

  • No unconnected views. A FeedMaster is not required to be in the Snaplex in this case.  This scenario is expected to be used if the data source is a Snap like a JMS Consumer.
  • One unconnected input view AND one or more unconnected output views. A FeedMaster must also be in the Snaplex for this to work. This is different from triggered tasks where it's one in and one out.
    • The unlinked input view type can be either binary or document:
      • If it's document, the headers from the original HTTP request will be in the root of the document and the body of the request will be in the 'content' field.  For example, the 'User-Agent' HTTP header can be referenced in the input document as $['user-agent'].  In addition to the HTTP headers in the request, the following fields will be added to the input document:
        • uri: The original URI of the request.
        • method: The HTTP request method.
        • query: The parsed version of the query string.  The value of this field will be an object whose fields correspond to query string parameters and a list of all the values for that parameter.  For example, the following query string:


foo=bar&foo=baz&one=1


Will result in a query object that looks like:

 {

"foo" : ["bar", "baz"],
"one": ["1"]
 }


        • task_name: The name of the Ultra task.
        • path_info: The part of the path after the Ultra task URL.
        • server_ip: The IP address of the feed-master that received the request.
        • server_port: The TCP port of the feed-master that received the request.
        • client_ip: The IP address of the client that sent the request.
        • client_port: The TCP port of the client that sent the request.
      • If it's binary, the HTTP headers are in the binary document's header.
    • The unlinked output view type can be binary or document as well:
      • If it's document, the output should have a 'content' field that will be JSON-encoded and sent back to the client.  The other fields in the output document will be treated as HTTP response headers to send back to the client.  If there is no 'content' field, the entire document will be JSON-encoded and used as the response body to be sent back to the client.
        • If the output document contains a 'status' field that is an integer, it will be treated as the HTTP response status code.
      • If it's binary, the binary document's header will be sent back to the client as the HTTP response headers.  The body of the binary document will be directly streamed back to the client.
      • SnapLogic expects exactly one output document for every input. If no output is sent, then the original HTTP request sent to the FeedMaster will hang and eventually timeout. If more than one output document is generated, then SnapLogic will only send the first one back as the response to the original HTTP request. This behavior is different from a Triggered Task, where all of the documents sent to the unlinked output are sent back in the response.

Error views are implicitly added to all Snaps when the pipeline is executed in ultra mode, regardless of how the Snaps are configured.


Creating Ultra Tasks

To create an Ultra task:

  1. In SnapLogic Manager, create a new task.
  2. Select the Snaplex on which to run the pipeline. The Snaplex must have a FeedMaster associated with it.
  3. Select the Ultra run policy for the task.
  4. Specify how many instances of the pipeline should be running on the Snaplex. 
    If the Snaplex has 5 nodes and you ask for 10, it will start 2 on each node. The algorithm for distributing on the nodes is based on how many of Ultra tasks are running there already. Each instance connects to the input queue, and gets delivery of messages as they come in.  When multiple instances are running, the message is delivered to a single instance.  If the pipeline/node fails for any reason, the message is un-acknowledged in the queue, so when the consumer is recognized as no longer connected, the message goes back onto the queue and will be picked up by another instance.  No message/request should be lost. 
  5. The Bearer Token is optional. You can clear this field and have the pipeline itself do any authentication work.
  6. Set Max Failures to the maximum number of failures before the task is automatically disabled. Set to 0 if the task should never be disabled. Default value is 10.
  7. Set Max In-Flight to the maximum number of documents that can be processed by an instance at any one time. Default is 200. 
  8. Click Create.

The pipeline is started on the Snaplex using the policy specified in the task. The pipeline consumes messages from the FeedMaster and processes the documents. The pipeline restarts automatically after the task or the pipeline itself was modified and saved.

Calling Ultra Tasks

To send HTTP requests to the pipeline, you will need the HTTP Authorization header and endpoint from Task’s Details page. The Authorization header must be included in any requests sent to the endpoint if a Bearer token is configured, otherwise an HTTP 401 Unauthorized error will be returned. The HTTP endpoint belongs to the FeedMaster that is part of the Snaplex that the task is configured to run on. As an example, the following cURL command line will send a GET request to the FeedMaster with the Authorization header:

  $ curl -k -H 'Authorization: Bearer jH6ofxtn3qe8jGoiSWbW3adW0N6KXziV' 
https://groundplex.local:8084/api/1/rest/feed-master/queue/Snaplogic/projects/llfeed-demo/hello-world-task


The runs can be monitored in the SnapLogic Dashboard and can be stopped by disabling or deleting the task. You should also be checking the logs in the task Details page since prepare fails will be logged there and not show up in the Dashboard. The FeedMaster writes a 'feed_master_access.log' that follows the web server common log format with the addition of an integer that specifies how many milliseconds the request took to process.

Limitations

Ultra tasks have the following limitations:

  • Snaps that wait until something is completed, such as Aggregate, Sort or Join, can not be used in Ultra tasks.
    • DB Snaps that write to the database must have the batch size in the account set to one, otherwise they wouldn't see the effect of the writes,
    • Unsupported Snaps include:

      • Aggregate

      • Head

      • Tail

      • Zip

      • DB Bulk Loaders

      • Tableau Write

      • Excel Formatter

      • CSV Formatter

      • Fixed Width Formatter

      • Sort

      • Diff

    • Snaps that work when batching is disabled:

      • REST PUT/POST

      • Email Sender

      • SOAP Execute

      • DB Insert Snaps

      • JSON Formatter -- Select the 'Format each document' option

      • XML Formatter -- Clear the 'Root Element' field

    • Snaps that require a timeout to be specified:

      • SOAP

      • All REST Snaps

    • Script and Execute Script Snaps need you to pass the original document to the 'write()' method for the output view. For example, in the JavaScript template, the body of the loop looks like:


      // Read the next document, wrap it in a map and write out the wrapper
         var doc = this.input.next();
         var wrapper = new java.util.HashMap();
         wrapper.put("original", doc);
         this.output.write(wrapper);

The last line should look like:
   this.output.write(doc, wrapper);

The original document is needed so that lineage can be maintained.

  • If an Ultra Pipeline has an unlinked input view, then it should also have one or more unlinked output views as well.
  • Pipeline parameters cannot be changed at run time. They are specified in the pipeline itself or when the task is created. They are set at pipeline startup, which only happens on initiation, not per document.

Related Information


  • No labels