Adding Knowledge with File Search ​
In the previous chapters, you created a basic agent and gave it instructions through a system prompt.
Now it’s time to make your agent smarter by grounding it in your own data.
Why Add Knowledge? ​
By default, the model only knows what it was trained on - it doesn’t have access to your organization’s private or domain-specific information.
To bridge this gap, we’ll use Retrieval-Augmented Generation (RAG).
- RAG lets the agent fetch relevant information from your own data before generating a response.
- This ensures your agent’s answers are accurate, up-to-date, and grounded in real information.
- In Azure AI Foundry, we’ll use the File Search feature to implement this.
In this chapter, you’ll use a folder called ./documents that contains information about Contoso Pizza stores - such as locations, opening hours, and menus.
We’ll upload these files to Azure AI Foundry, create a vector store, and connect that store to the agent using a File Search tool.
Step 1 - Create a Vector Store Script ​
We’ll build this step by step to make sure everything is clear.
Your goal: create a script that uploads files, creates a vector store, and vectorizes your data for search.
Part A - Prepare Your Environment ​
Goal: Load secrets from .env and import the necessary SDKs.
Create a new file called add_data.py and add:
import os
from dotenv import load_dotenv
# Azure SDK imports
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential
from azure.ai.agents.models import FilePurpose
# Load environment variables (expects PROJECT_CONNECTION_STRING in .env)
load_dotenv(override=True)Why:
.envkeeps your credentials separate from code.AIProjectClientlets you interact with your Azure AI Foundry project.FilePurpose.AGENTStells the service these files are for agents.
Part B - Connect to Your Azure AI Foundry Project ​
Goal: Create the project client using your connection string.
Append this to your script:
project_client = AIProjectClient(
endpoint=os.environ["PROJECT_CONNECTION_STRING"],
credential=DefaultAzureCredential()
)Why:
This connects your script to your Azure AI Foundry project, allowing file uploads and vector store creation to happen in your workspace.
Part C - Upload Your Documents ​
Goal: Upload files from ./documents and collect their IDs.
Append this:
DOCS_DIR = "./documents"
if not os.path.isdir(DOCS_DIR):
raise FileNotFoundError(
f"Documents folder not found at {DOCS_DIR}. "
"Create it and add your Contoso Pizza files (PDF, TXT, MD, etc.)."
)
print(f"Uploading files from {DOCS_DIR} ...")
file_ids = []
for fname in os.listdir(DOCS_DIR):
fpath = os.path.join(DOCS_DIR, fname)
# skip directories and hidden files like .DS_Store
if not os.path.isfile(fpath) or fname.startswith('.'):
continue
uploaded = project_client.agents.files.upload_and_poll(
file_path=fpath,
purpose=FilePurpose.AGENTS
)
file_ids.append(uploaded.id)
print(f"Uploaded {len(file_ids)} files.")
if not file_ids:
raise RuntimeError("No files uploaded. Put files in ./documents and re-run.")Why:
Your documents must be uploaded before they can be vectorized and made searchable.
Tip: Keep documents short and relevant (store info, hours, menus). Split very large docs when possible.
Part D - Create a Vector Store ​
Goal: Create an empty vector store that will store and index your document embeddings.
Append:
vector_store = project_client.agents.vector_stores.create_and_poll(
data_sources=[],
name="contoso-pizza-store-information"
)
print(f"Created vector store, ID: {vector_store.id}")Why:
A vector store is what enables semantic search - it finds text that means the same thing as the user’s query, even if the words differ.
Part E - Vectorize Files into the Store ​
Goal: Add your uploaded files to the vector store and process them for search.
Append:
batch = project_client.agents.vector_store_file_batches.create_and_poll(
vector_store_id=vector_store.id,
file_ids=file_ids
)
print(f"Created vector store file batch, ID: {batch.id}")Why:
This creates vector embeddings for your files so the agent can later retrieve relevant chunks via the File Search tool.
Final file ​
import os
from dotenv import load_dotenv
# Azure SDK imports
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential
from azure.ai.agents.models import FilePurpose
# Load environment variables (expects PROJECT_CONNECTION_STRING in .env)
load_dotenv(override=True)
project_client = AIProjectClient(
endpoint=os.environ["PROJECT_CONNECTION_STRING"],
credential=DefaultAzureCredential()
)
DOCS_DIR = "./documents"
if not os.path.isdir(DOCS_DIR):
raise FileNotFoundError(
f"Documents folder not found at {DOCS_DIR}. "
"Create it and add your Contoso Pizza files (PDF, TXT, MD, etc.)."
)
print(f"Uploading files from {DOCS_DIR} ...")
file_ids = []
for fname in os.listdir(DOCS_DIR):
fpath = os.path.join(DOCS_DIR, fname)
# skip directories and hidden files like .DS_Store
if not os.path.isfile(fpath) or fname.startswith('.'):
continue
uploaded = project_client.agents.files.upload_and_poll(
file_path=fpath,
purpose=FilePurpose.AGENTS
)
file_ids.append(uploaded.id)
print(f"Uploaded {len(file_ids)} files.")
if not file_ids:
raise RuntimeError("No files uploaded. Put files in ./documents and re-run.")
vector_store = project_client.agents.vector_stores.create_and_poll(
data_sources=[],
name="contoso-pizza-store-information"
)
print(f"Created vector store, ID: {vector_store.id}")
batch = project_client.agents.vector_store_file_batches.create_and_poll(
vector_store_id=vector_store.id,
file_ids=file_ids
)
print(f"Created vector store file batch, ID: {batch.id}")Run the Script ​
From your workshop/ directory, run:
python add_data.pyExample output:
Uploading files from ./documents ...
Uploaded 19 files.
Created vector store, ID: vs_ii6H96sVMeQcXICvj7e3DsrK
Created vector store file batch, ID: vsfb_47c68422adc24e0a915d0d14ca71a3cf✅ Copy the vector store ID - you’ll use it in the next section.
Step 2 - Add the File Search Tool ​
Now that you’ve created your vector store, let’s connect it to your agent.
In agent.py, right after you create your AIProjectClient, add:
# Create the File Search tool
vector_store_id = "<INSERT YOUR VECTOR STORE ID HERE>"
file_search = FileSearchTool(vector_store_ids=[vector_store_id])Add the Tool to a Toolset ​
# Create the toolset
toolset = ToolSet()
toolset.add(file_search)Create the Agent with Knowledge ​
Find the block where you create your agent and modify it to include the toolset:
agent = project_client.agents.create_agent(
model="gpt-4o",
name="my-agent",
instructions=open("instructions.txt").read(),
top_p=0.7,
temperature=0.7,
toolset=toolset # Add the toolset to the agent
)
print(f"Created agent, ID: {agent.id}")Step 3 - Run the Agent ​
Try it out:
python agent.pyAsk questions like:
“Which Contoso Pizza stores are open after 8pm?”
“Where is the nearest Contoso Pizza store?”
Type exit or quit to stop the conversation.
Recap ​
In this chapter, you:
- Learned how RAG grounds your agent with your own data
- Uploaded files from the
./documentsdirectory - Created and populated a vector store
- Added a File Search tool to your agent
- Extended your PizzaBot to answer questions about Contoso Pizza stores
Final code sample ​
import os
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential
from azure.ai.agents.models import MessageRole, FilePurpose, FunctionTool, FileSearchTool, ToolSet
from dotenv import load_dotenv
load_dotenv(override=True)
# Creating the AIProjectClient
project_client = AIProjectClient(
endpoint=os.environ["PROJECT_CONNECTION_STRING"],
credential=DefaultAzureCredential()
)
# Create the file_search tool
vector_store_id = "<INSERT COPIED VECTOR STORE ID>"
file_search = FileSearchTool(vector_store_ids=[vector_store_id])
# Creating the toolset
toolset = ToolSet()
toolset.add(file_search)
# Enable automatic function calling for this toolset so the agent can call functions directly
project_client.agents.enable_auto_function_calls(toolset)
# Creating the agent
agent = project_client.agents.create_agent(
model="gpt-4o",
name="my-agent",
instructions=open("instructions.txt").read(),
top_p=0.7,
temperature=0.7,
toolset=toolset # Add the toolset to the agent
)
print(f"Created agent, ID: {agent.id}")
# Creating the thread
thread = project_client.agents.threads.create()
print(f"Created thread, ID: {thread.id}")
try:
while True:
# Get the user input
user_input = input("You: ")
# Break out of the loop
if user_input.lower() in ["exit", "quit"]:
break
# Add a message to the thread
message = project_client.agents.messages.create(
thread_id=thread.id,
role=MessageRole.USER,
content=user_input
)
# Process the agent run
run = project_client.agents.runs.create_and_process(
thread_id=thread.id,
agent_id=agent.id
)
# List messages and print the first text response from the agent
messages = project_client.agents.messages.list(thread_id=thread.id)
first_message = next(iter(messages), None)
if first_message:
print(next((item["text"]["value"] for item in first_message.content if item.get("type") == "text"), ""))
finally:
# Clean up the agent when done
project_client.agents.delete_agent(agent.id)
print("Deleted agent")