MCPNew: now works with Claude & AI assistants
EMW, Inc.

EMW, Inc.

Website

2026-0095 Big Data and AI Technology for Searchable Archives (NS) - WED 8 Jul

Company

EMW, Inc.

Role

2026-0095 Big Data and AI Technology for Searchable Archives (NS) - WED 8 Jul

Job type

Contract

Found on Mokaru

22 hours ago

Share this job

Salary

Not disclosed by employer

Job description

BIDDING INSTRUCTIONS

The Bidder shall submit the Proposed Person Curriculum Vitae (CV). This CV shall have enough details and evidence of the individual's previous work to show suitability and compliance for the job based on the work description included in the Statement of Work.

Deadline Date: Wednesday 08 July 2026

Requirement: Big Data and AI Technology for Raw Data to Searchable Archives - Data Processing Pipeline

Location: On-site at NATO Communications and Information Agency, The Hague, The Netherlands

Period of Performance: 2026-27: 12 August 2026 – 28 February 2027

Required Security Clearance: NATO SECRET

  • INTRODUCTION

The NATO Communications and Information Agency (NCIA) located in The Hague, Netherlands, is currently involved in processing vast amounts of highly variant data coming from theatre for the purpose of efficient archiving.

Within NCIA Chief Technology Office, the Exploiting Data Science and Artificial Intelligence (EDS&AI) team is tasked to apply Big Data and AI technology to prepare, run and adjust processing pipelines for processing various source data into archiving formats and metadata, and prepare for semantic search.

NATO has an obligation to support national investigations into situations that occurred in theatre. In order to support the different teams involved most optimally, the EDS&AI team brings the expertise to extract and exploit the vast and varied data on the table, by using the Agency's high performance computing classified sandbox.

The EDS&AI team provides the core data science skills and technology needed for big data analysis and AI, and applies innovative technology to data whenever it is not possible to extract value with conventional approaches.

  • OBJECTIVE

This Statement of Work describes the work necessary to provide specific AI and Data Exploitation activities for processing raw data from theatre to searchable archives. The services will be provided to the NCIA CTO/EDS&AI team, as they deliver specialised Data Science and AI results to their stakeholders in NATO Headquarters and NATO Allied Command Operations.

Overarching objectives

  • Make required documents from theatre accessible and searchable by archivists during execution
  • Capture document contents into long term preservation formats
  • Capture Functional Area System (FAS; back-up) contents into long term preservation formats
  • Identify (and remove) duplicate documents, records of temporary value and non-records that are not required for archiving
  • Provide interim and final data reports describing actions and results

This task is structured as a deliverable-based engagement and not as level-of-effort support.

  • SCOPE OF WORK

Under the direction of CTO-EDS&AI, the Contractor shall design, build, adapt, execute and maintain data processing pipelines within the NCIA classified sandbox environment.

Setting up and improving pipelines to process all required documents that uniquely identify and trace decisions and processing steps. This is to be conducted on the provided classified sandbox environment, with provided performance hardware and toolsets.

Implementing and improving pipeline steps for marking duplicate files, based on file attributes, path structure and content similarity, and rules for considering a file or structure a duplicate.

Extracting document-format records from Functional Area Systems (FAS) databases and back-ups. Archiving SMEs and system SMEs are available for guidance on target formats and source system structure and data interpretation. Each FAS is processed separately.

Processing and monitoring progress of various office, image and video file types to accepted archiving formats, including extraction of metadata and preparing semantic search indexes.

Automating the registration of all processed documents with semantic indexes using the sandbox natural language search tool.

Automating the final copy of all non-duplicate and extracted archive documents with content and metadata to the NATO archiving system.

Reporting status, progress and statistics of the raw files being processed to archive formats, metadata and search indexes.

Delivering full reporting of results, trace of pipeline steps taken and stakeholder-accepted failures. Quarterly updates.

In general, most items will translate to a build (new pipeline or processing step), execute (reported progress on data batches), improve (optimized or corrected pipeline or processing step) or monitor (check on logs and progressing statistics) activity. Orchestrating pipelines are expected to utilize KNIME. Reporting efforts are expected to target Microsoft Power BI dashboards. GitLab is expected to be used for source code management and documentation.

  • DELIVERABLES AND PAYMENT MILESTONES

Work shall be delivered through Processing Units (PU). A Processing Unit represents completion of a prioritized measurable work package. The following Processing Unit types and quantities apply:

Delivery or enhancement of a pipeline component: 4 PUs

Execution of processing on an agreed logical data segment: 6 PUs

Operational dashboard capability: 3 PUs

Maintenance or optimisation activity improving processing capability: 5 PUs

Processing Units may vary in complexity. Weighting and sequencing shall be agreed during execution based on dataset characteristics and operational priorities.

Planning assumptions

  • Target delivery rate of approximately two (2) to three (3) Processing Units per month
  • Capability-focused deliveries expected during early execution phases
  • Processing and stabilisation-focused deliveries expected in later phases
  • NCIA reserves the right to adjust priorities and sequencing.

Deliverable 01: Processing Unit. Quantity: 18. Cost Ceiling: EUR 5,450 per PU. Payment Milestone: Target delivery rate of approximately two (2) to three (3) Processing Units per month. Payment upon successful acceptance of each Processing Unit and signed Delivery Acceptance Sheet (DAS).

This statement of work also includes for the contractor

  • Produce Processing Unit completion reports (format: email update on section 4 items), which include details of activities performed.
  • Participate in meetings and boards on an as-needed basis, or as requested by NCI Agency.

Payment shall be dependent upon successful acceptance of the Processing Units and the Delivery Acceptance Sheet (Annex C). Invoices shall be accompanied by a Delivery Acceptance Sheet signed by the Contractor and the project authority.

Note: All deliverables and the underlying work under this contract will be in English. Deliverables will be sent via email or removable media. Disclosure of any or all deliverables to any third party besides NCI Agency will require the prior agreement of NCI Agency.

  • COORDINATION AND REPORTING

The Contractor shall provide services on-site at NATO Communications and Information Agency in The Hague. The Contractor shall coordinate with, and report to, the NCIA EDS&AI team in The Hague.

The Contractor shall be given access to necessary NATO IT systems, and shall comply with all necessary policies and procedures.

The Contractor shall participate in regular status update meetings and other meetings, physically in the office or in person via electronic means using Conference Call capabilities, according to their manager's instructions.

For each Processing Unit to be considered complete and payable, the contractor must report the outcome of their work, first verbally during the retrospective meeting and then in writing within three (3) days after the deliverable production ended. A report in the format of a short email shall be sent to the nominated point of contact of the NCI Agency, mentioning briefly the work held and the development achievements.

Knowledge transfer activities may include provision of operational documentation, pipeline overview briefings and lessons learned. Detailed arrangements shall be agreed during execution.

  • SCHEDULE

This task order will be active immediately after signing of the contract by both parties.

The 2026-2027 period of performance is 12 August 2026 to 28 February 2027.

  • SECURITY

The services required under this SOW require a valid NATO SECRET security clearance prior to the start of the engagement.

  • CONSTRAINTS

All documentation provided under this statement of work will be based on NCI Agency templates and/or agreed with the NCIA service delivery point of contact.

All support, maintenance, documentation and required code will be stored under configuration management and/or in the provided NCI Agency tools.

  • PRACTICAL ARRANGEMENTS

The contractor is expected to provide services on-site at NATO Communications and Information Agency, The Hague, The Netherlands.

The services shall be provided during normal office hours following the on-site location calendar.

The contractor will provide services under the direction and guidance of the CTO-EDS&AI or their designated representative.

The contractor is not expected to travel.

ONE contractor must accomplish this work. In the event the contractor leaves during the contract period, a new contractor who has the proven required qualifications and is evaluated as qualified and suitable shall replace them. All normal AAS+ Framework Contract Terms and Conditions apply.

  • SPECIFIC REQUIREMENTS

[See Requirements]

  • EDUCATIONAL QUALIFICATIONS

[See Requirements]

  • SECURITY
  • The services required under this SOW require a valid NATO SECRET security clearance prior to the start of the engagement.
  • SPECIFIC REQUIREMENTS
  • At least 3 years of practical experience in the field of data science and/or data analytics.
  • Experience using data processing, visualization and analytics software packages and development environments, preferably including KNIME, VS Code, GitLab, Power BI, Jupyter Lab, and Docker-based APIs.
  • Experience with Big Data processing, creating and utilizing containerized building blocks and running containers (APIs) on Kubernetes clusters.
  • Experience with programming and scripting in languages such as Python, R and SQL, and working with data formats including CSV, XML and JSON.
  • Experience performing content extraction from files, databases and systems, including LLM-based embedding models, entity extraction, keyword extraction and content similarity measures.
  • Creative, flexible and proactive in overcoming obstacles.
  • Good drafting, communication and presentation skills in English, including at both technical and non-technical levels.
  • High attention to detail and accuracy.
  • EDUCATIONAL QUALIFICATIONS
  • Master's degree in Computer Science, Engineering or a relevant field.
  • A higher degree in Data Science is preferred.
Resume ExampleCover Letter Example

Explore more