The Customer: A Government Procurement and RFP Portal
Our client operates a procurement portal used by government departments and large institutions to manage the tendering process - publishing RFPs, evaluating vendor responses, and selecting suppliers. The portal needed a way to provide intelligent, context-aware answers to procurement officers asking questions about vendor capabilities, compliance records, product offerings, and industry suitability - all derived from real, up-to-date information scraped from vendor websites.
The Problem
Procurement is research-intensive. Officers evaluating responses to RFPs need to quickly understand whether a vendor is genuinely qualified - but vendor information is scattered across thousands of different websites in unstructured formats.
- Manual Vendor Research is a Time Sink: Procurement officers were spending days researching vendors manually - visiting websites, reading annual reports, cross-referencing certifications, and trying to piece together a coherent picture of what a company actually does. This was slowing down every tender cycle.
- Keyword Search Fails for Complex Queries: The portal's existing search functionality was keyword-based. A query like "vendors with ISO 27001 certification and experience in public sector cloud infrastructure" would return either nothing or a flood of irrelevant results. Real procurement questions are semantic, not lexical.
- Stale and Incomplete Vendor Data: Vendor profiles in the system were static - entered once and rarely updated. Company capabilities, certifications, and offerings evolve constantly, but the portal had no mechanism to reflect those changes.
- No Intelligent Synthesis: Even when relevant documents existed, the system couldn't synthesize them. Officers had to read multiple documents and form their own conclusions manually.
How We Helped
We built an AI-powered web scraping and knowledge indexing system that continuously crawls vendor websites, structures the extracted data, and feeds it into a retrieval-augmented generation (RAG) pipeline powering the portal's AI query engine.
- Intelligent Web Crawler: The crawler starts from a list of vendor root domains and intelligently discovers and traverses subdomains, product pages, blog posts, case studies, certification pages, and downloadable documents (PDFs, datasheets). It handles JavaScript-rendered pages, pagination, and rate limiting gracefully.
- Content Extraction and Structuring: Raw scraped content is processed through an AI pipeline that classifies each page (product info, certifications, about/company, case study, pricing, etc.) and extracts structured data - services offered, industries served, certifications held, client references, geographic coverage, and more.
- Embedding and Knowledge Indexing: Structured content is chunked, embedded using a high-quality embedding model, and stored in a vector database. Metadata filters ensure queries can be scoped by vendor, industry, certification type, and geography.
- RFP Query Engine: The portal's query interface now uses a RAG approach - procurement officers ask natural language questions and receive synthesized, cited answers drawn from the real content of vendor websites, with source references so officers can verify and dig deeper.
- Automated Re-Crawl Schedule: Vendor websites are re-crawled on a rolling schedule, ensuring the knowledge base stays current without manual intervention.
The Results: A Portal That Actually Knows Its Vendors
RFP query accuracy - measured by whether the AI answer correctly addressed the procurement officer's actual question - jumped from 38% (keyword search) to 91% (AI RAG). Officers now receive synthesized, sourced answers to complex vendor qualification questions in under 30 seconds.
The procurement team effectively gained a continuously updated intelligence layer on their entire vendor universe - without adding any research staff. Tender cycles shortened, and the quality of vendor evaluation improved measurably.






