Multi-Document Agent
In this guide, you learn towards setting up an agent that can effectively answer different types of questions over a larger set of documents.
These questions include the following
- QA over a specific doc
- QA comparing different docs
- Summaries over a specific doc
- Comparing summaries between different docs
We do this with the following architecture:
- setup a “document agent” over each Document: each doc agent can do QA/summarization within its doc
- setup a top-level agent over this set of document agents. Do tool retrieval and then do CoT over the set of tools to answer a question.
Setup and Download Data
We first start by installing the necessary libraries and downloading the data.
pnpm i llamaindex
import {
Document,
ObjectIndex,
OpenAI,
OpenAIAgent,
QueryEngineTool,
SimpleNodeParser,
SimpleToolNodeMapping,
SummaryIndex,
VectorStoreIndex,
serviceContextFromDefaults,
storageContextFromDefaults,
} from "llamaindex";
And then for the data we will run through a list of countries and download the wikipedia page for each country.
import fs from "fs";
import path from "path";
const dataPath = path.join(__dirname, "tmp_data");
const extractWikipediaTitle = async (title: string) => {
const fileExists = fs.existsSync(path.join(dataPath, `${title}.txt`));
if (fileExists) {
console.log(`File already exists for the title: ${title}`);
return;
}
const queryParams = new URLSearchParams({
action: "query",
format: "json",
titles: title,
prop: "extracts",
explaintext: "true",
});
const url = `https://en.wikipedia.org/w/api.php?${queryParams}`;
const response = await fetch(url);
const data: any = await response.json();
const pages = data.query.pages;
const page = pages[Object.keys(pages)[0]];
const wikiText = page.extract;
await new Promise((resolve) => {
fs.writeFile(path.join(dataPath, `${title}.txt`), wikiText, (err: any) => {
if (err) {
console.error(err);
resolve(title);
return;
}
console.log(`${title} stored in file!`);
resolve(title);
});
});
};