Data Engineer — AI & Spatial Data Infrastructure | Careers

Role overview

The Data Engineer — AI & Spatial Data Infrastructure designs and operates production-grade data pipelines that power SPARC Global's document intelligence and geospatial AI platforms. You will work with Python, cloud data services, PostGIS, vector databases, and orchestration tools to ingest, transform, and serve large volumes of structured and unstructured data — including PDFs, satellite imagery, and GIS datasets.

This is an on-site role based at the SPARC Global office in Bhubaneswar, ideal for an engineer who thrives on messy real-world data, high-throughput ETL, and the intersection of spatial computing with modern AI infrastructure.

Key responsibilities

Design, build, and maintain Python-based ETL pipelines for document parsing, spatial data ingestion, and embeddings generation.
Operate orchestration workflows using Airflow, Databricks, Azure Data Factory, Prefect, or equivalent tooling.
Manage PostgreSQL/PostGIS databases and vector stores (Pinecone, Weaviate, Qdrant, pgvector) for production workloads.
Process raster and vector geospatial data — COGs, STAC catalogs, shapefiles, GeoJSON — with GDAL, rasterio, and Fiona.
Implement robust handling for challenging document sources: scanned PDFs, rotated pages, password-protected files, and OCR pipelines.
Containerize and deploy data services using Docker on Azure, AWS, or GCP.
Monitor pipeline reliability, performance, and data quality at scale (1,000–10,000+ files per day).

Qualifications & skills

B.Tech / BE or equivalent in Computer Science, IT, or a related discipline.
3+ years of professional Python experience in production environments.
2+ years building ETL or data pipeline systems end to end.
Strong hands-on skills with pandas, NumPy, and async Python patterns.
Experience with PostgreSQL and ideally PostGIS for spatial data storage and querying.
Familiarity with document parsing libraries (pdfplumber, Camelot, openpyxl, python-docx, or OCR tooling).
Working knowledge of Docker and at least one major cloud platform (Azure preferred).
Public GitHub profile demonstrating relevant data engineering work.
Ability to work from the SPARC Global office in Bhubaneswar (or willingness to relocate).

Nice to have

Experience with vector databases and embeddings pipelines (OpenAI, Cohere, or similar APIs).
Exposure to raster/vector GIS processing, CRS transformations, and spatial indexing.
Familiarity with Databricks, Snowflake, Kafka/Kinesis, Neo4j, or FastAPI/Flask data services.
Experience building pipelines processing 10,000+ files per day.
Regular use of AI coding assistants (Cursor, Copilot, Claude) in development workflow.

Why join SPARC Global

Work on cutting-edge AI and spatial data infrastructure with real enterprise and government datasets.
Hands-on engineering role with ownership of pipeline architecture and production reliability.
Collaborative Bhubaneswar engineering team with exposure to global SPARC projects.
Fast-moving environment with rolling application review and a 5-business-day response target.
Competitive compensation aligned with experience and market rates in India.

Terrascope platforms

AgriScope

MineScope

ForestScope

GovEarner

Software Services

Geospatial Services