Skip to main content

DataCollection

helloData-collection world!is the process of gathering data and information for use in (chat)completions. Data can be gathered from various sources like OneDrive, file shares, cloud services, API's, databases or custom build connectors. Each data-collection process is defined by a series of steps and conversions of the data for usage in ScaiGrid and completions. 

Agent

 

Indexer

asdfThe main purpose of an indexer is to determine new, modified and deleted object

Endpoints

Collection

adsfA collection represents a set of objects that are logically group together. Collections are processed in the order they are announced, first-in-first-out (FIFO). In the simplest form a DataCollector agent will announce a collection for each Indexer run that is performed containing the objects that are listed for addition, modification of deletion.

Object

asdfAn object is a structured representation of data in the widest form. Objects can be concrete things, such as a document or a webpage, but can also be more abstract. The indexer determines the definition of the object and the form it is presented in, meaning an object can be anything in any form fitted to the needs of the implementation and purpose of the object stored for use in (chat)completions. 

Examples of an object are:

  • Document
  • Webpage or a complete website
  • A single (database) record or a complete (database) table
  • Response from an API or a collection of responses combined
Chunk

A chunk, also known as a data-chunk, contains the smallest logical representation of a part of an object.object, such as a paragraph of a document. By splitting the object into smaller parts it is possible to quote and refer to specific parts of an object instead of referring to the complete object itself. This gives more context and better results in completions. Combined with specific metadata for the chunk the context can be enriched for even better results.