Nessolabs
Web Scraping Engineer — European Public Procurement
Company
Role
Web Scraping Engineer — European Public Procurement
Location
Job type
Contract
Posted
Yesterday
Salary
Job description
We're building the data backbone for European public procurement. Our platform aggregates tender data from 100+ e-procurement portals — each with its own quirks, anti-bot protections, and legacy HTML.
We're looking for a scraping engineer who can navigate this landscape: someone who's comfortable with headless browsers, knows how to handle sessions and CAPTCHAs, and won't panic when the same platform serves three different HTML layouts across pages.
What you'll do
- Build and maintain async scrapers (Python + Playwright) against Italian and later European public procurement portals (Maggioli PortaleAppalti, ANAC, MePA, and others)
- Handle real-world challenges: JSESSIONID session management, FriendlyCaptcha/Mosparo anti-bot, Cloudflare WAF, IP rotation with rate limit backoff
- Parse Italian data formats — amounts (€ 1.234.567,89), dates (DD/MM/YYYY, textual), CIG/CUP identifiers with placeholder detection
- Extract and process documents: PDF, .p7m (PKCS#7 signed), ZIP/7Z archives, with OCR fallback
- Integrate scrapers into our Prefect orchestration pipeline with monitoring, alerting, and anomaly detection
- Work with PostgreSQL, Supabase, Clickhouse, and S3 for dual-sink storage with upsert/idempotency patterns
What we're looking for
- Strong async Python — you think in asyncio, not time.sleep()
- Playwright or Selenium experience — you've intercepted XHR responses, handled SPAs, and debugged timing issues
- Resilience mindset — retry with backoff, graceful degradation, circuit breakers. Your scraper doesn't crash at 3 AM.
- Comfort with messy HTML — you can write a multi-strategy extractor that handles
/ , /, and / on the same site - Data parsing skills — Italian locale, date formats, CIG validation, document type detection
- Bonus: experience with Italian PA (Pubblica Amministrazione) portals, ANAC/PVL datasets, or OCDS data formats
Tech stack Python 3.11+ · Playwright · httpx · BeautifulSoup · Pydantic · SQLAlchemy 2.0 · PostgreSQL · Prefect · AWS S3 · Supabase
How we hire No whiteboard algorithms. We'll send you a hands-on technical assessment: a mock procurement portal with real-world challenges. You build a scraper. We evaluate the code.
Explore more
Career resources
Preparing to apply? These guides help you stand out.
How to Find a Remote Job in 2026: The Complete GuideLearn how to find a remote job in 2026 with proven strategies for tailoring your resume, choosing the right platforms, acing virtual interviews, and avoiding scams.job search strategyremote workinterview prepMokaru Team
How to Ace a Video Interview in 2026: The Complete Setup, Camera, and Body Language PlaybookAround 9 in 10 companies now run a video interview before they meet you in person. This 2026 guide covers the tech, environment, body language, and answers that will get you to the next round, plus how to handle the awkward AI elephant in the room.interview prepremote workcareer adviceMokaru Team
How to Network for a Job in 2026: A Practical Guide That Actually WorksNetworking fills most jobs in 2026, but the old advice no longer works. A practical guide with scripts, examples, and a 30-minute-a-day system to build a network that lands you better roles.networkingjob search strategylinkedinMokaru TeamSimilar jobs
SAMPOERNA ACADEMY - IGCSE & A LEVEL CHEMISTRY TEACHER - AY 26/27
Sampoernaschoolssystem
Full-timeIndonesia, id17 hours agoBrand Marketing Intern
Adasiaholdings
Indonesia, IndonesiaYesterdayClient Success Representative (Remote within APAC region, Contract)
Infuse
Indonesia, IndonesiaYesterdayCustomer Service Expert (Remote within APAC region, Contract)
Infuse
Indonesia, IndonesiaYesterdaySalesforce Developer
Theeconomistgroup
Indonesia, Indonesia2 days agoExperienced Professional, Mechanical Engineering
Emit.fa.ca3
Indonesia, ID6 days ago