Skip to content

feat: add dataset api #622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 37 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
bb5536a
feat: add dataset api
galzilber Aug 6, 2025
6955d14
fix
galzilber Aug 6, 2025
52e8eef
add new records
galzilber Aug 7, 2025
a1c4ab6
remove no need
galzilber Aug 7, 2025
41abe67
fix small issue
galzilber Aug 7, 2025
95ad80b
code cleanup
galzilber Aug 7, 2025
55ba20b
prettier
galzilber Aug 7, 2025
2e07925
enhance dataset integration tests with fetch adapter and improved req…
galzilber Aug 7, 2025
26b57b5
prettier
galzilber Aug 7, 2025
85f2547
fix issue
galzilber Aug 7, 2025
c442aa8
Merge branch 'main' into gz/add-datasets
galzilber Aug 7, 2025
9105734
update dataset routes
galzilber Aug 11, 2025
806ca56
Merge branch 'gz/add-datasets' of github.com:traceloop/openllmetry-js…
galzilber Aug 11, 2025
352403a
fix the recording
galzilber Aug 11, 2025
dae79c5
removeing old records
galzilber Aug 11, 2025
bebbc8c
fix linit and tests
galzilber Aug 11, 2025
8d81e87
fix test
galzilber Aug 11, 2025
c278f6b
change the recording
galzilber Aug 11, 2025
348619d
Fix the tests
galzilber Aug 11, 2025
880d752
Merge branch 'main' into gz/add-datasets
galzilber Aug 12, 2025
2ba1254
remove unused code
galzilber Aug 12, 2025
c0dccf8
Merge branch 'gz/add-datasets' of github.com:traceloop/openllmetry-js…
galzilber Aug 12, 2025
a6f4425
remove unusued
galzilber Aug 12, 2025
487f8b3
prettier
galzilber Aug 12, 2025
5cc2b42
record tests
galzilber Aug 12, 2025
8e8c1e2
remove unused code
galzilber Aug 12, 2025
132dd45
update dataset test recordings and remove unused pagination fields
galzilber Aug 12, 2025
2684fa8
fix bugs
galzilber Aug 12, 2025
05fdc29
remove unused setValue method from Row class
galzilber Aug 12, 2025
8f5e11b
fix bugs
galzilber Aug 12, 2025
5aad2ed
refactor Dataset and Datasets classes to simplify column creation and…
galzilber Aug 12, 2025
46b7e12
change the records
galzilber Aug 12, 2025
36b89a6
prettier
galzilber Aug 12, 2025
7702f40
fix
galzilber Aug 12, 2025
1c5af21
update dataset listing to retrieve all datasets instead of a limited …
galzilber Aug 12, 2025
c215659
refactor column definitions in dataset and test files for consistency…
galzilber Aug 12, 2025
a65b2b9
lint fix
galzilber Aug 12, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions packages/sample-app/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@
"run:pinecone": "npm run build && node dist/src/sample_pinecone.js",
"run:langchain": "npm run build && node dist/src/sample_langchain.js",
"run:sample_structured_output": "npm run build && node dist/src/sample_structured_output.js",
"run:dataset": "npm run build && node dist/src/sample_dataset.js",
"test:dataset": "npm run build && node dist/src/test_dataset_api.js",
"run:image_generation": "npm run build && node dist/src/sample_openai_image_generation.js",
"run:sample_edit": "npm run build && node dist/src/test_edit_only.js",
"run:sample_generate": "npm run build && node dist/src/test_generate_only.js",
Expand Down
320 changes: 320 additions & 0 deletions packages/sample-app/src/sample_dataset.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,320 @@
import * as traceloop from "@traceloop/node-server-sdk";
import OpenAI from "openai";

const main = async () => {
// Initialize Traceloop SDK
traceloop.initialize({
appName: "sample_dataset",
apiKey: process.env.TRACELOOP_API_KEY,
disableBatch: true,
traceloopSyncEnabled: true,
});
Comment on lines +6 to +11
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Fail fast if TRACELOOP_API_KEY is missing

Prevent confusing runtime behavior by validating the Traceloop API key before initializing.

   // Initialize Traceloop SDK
-  traceloop.initialize({
+  const traceloopApiKey = process.env.TRACELOOP_API_KEY;
+  if (!traceloopApiKey) {
+    console.warn("⚠️ TRACELOOP_API_KEY not set. Skipping dataset demo.");
+    return;
+  }
+  traceloop.initialize({
     appName: "sample_dataset",
-    apiKey: process.env.TRACELOOP_API_KEY,
+    apiKey: traceloopApiKey,
     disableBatch: true,
     traceloopSyncEnabled: true,
   });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
traceloop.initialize({
appName: "sample_dataset",
apiKey: process.env.TRACELOOP_API_KEY,
disableBatch: true,
traceloopSyncEnabled: true,
});
// Initialize Traceloop SDK
const traceloopApiKey = process.env.TRACELOOP_API_KEY;
if (!traceloopApiKey) {
console.warn("⚠️ TRACELOOP_API_KEY not set. Skipping dataset demo.");
return;
}
traceloop.initialize({
appName: "sample_dataset",
apiKey: traceloopApiKey,
disableBatch: true,
traceloopSyncEnabled: true,
});
🤖 Prompt for AI Agents
In packages/sample-app/src/sample_dataset.ts around lines 6 to 11, the Traceloop
API key is used directly which can lead to confusing runtime behavior if
TRACELOOP_API_KEY is missing; add a guard before traceloop.initialize that
checks process.env.TRACELOOP_API_KEY for existence/non-empty, and if missing
either throw a descriptive Error or log a clear message and call
process.exit(1), then pass the validated value into traceloop.initialize so
initialization only runs when a valid key is present.


await traceloop.waitForInitialization();

const client = traceloop.getClient();
if (!client) {
console.error("Failed to initialize Traceloop client");
return;
}

console.log("🚀 Dataset API Sample Application");
console.log("==================================\n");

try {
// 1. Create a new dataset for tracking LLM interactions
console.log("📝 Creating a new dataset...");
const dataset = await client.datasets.create({
name: `llm-interactions-${Date.now()}`,
description:
"Dataset for tracking OpenAI chat completions and user interactions",
});

console.log(`✅ Dataset created: ${dataset.name} (ID: ${dataset.id})\n`);

// 2. Define the schema by adding columns
console.log("🏗️ Adding columns to define schema...");

const columnsToAdd = [
{
name: "user_id",
type: "string" as const,
required: true,
description: "Unique identifier for the user",
},
{
name: "prompt",
type: "string" as const,
required: true,
description: "The user's input prompt",
},
{
name: "response",
type: "string" as const,
required: true,
description: "The AI model's response",
},
{
name: "model",
type: "string" as const,
required: true,
description: "The AI model used (e.g., gpt-4)",
},
{
name: "tokens_used",
type: "number" as const,
required: false,
description: "Total tokens consumed",
},
{
name: "response_time_ms",
type: "number" as const,
required: false,
description: "Response time in milliseconds",
},
{
name: "satisfaction_score",
type: "number" as const,
required: false,
description: "User satisfaction rating (1-5)",
},
{
name: "timestamp",
type: "string" as const,
required: true,
description: "When the interaction occurred",
},
];
await dataset.addColumn(columnsToAdd);

console.log("✅ Schema defined with 8 columns\n");

// 3. Simulate some LLM interactions and collect data
console.log("🤖 Simulating LLM interactions...");

const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});

const samplePrompts = [
"Explain machine learning in simple terms",
"Write a Python function to calculate fibonacci numbers",
"What are the benefits of using TypeScript?",
"How does async/await work in JavaScript?",
"Explain the concept of closures in programming",
];

const interactions = [];

for (let i = 0; i < samplePrompts.length; i++) {
const prompt = samplePrompts[i];
const userId = `user_${String(i + 1).padStart(3, "0")}`;

console.log(` Processing prompt ${i + 1}/${samplePrompts.length}...`);

const startTime = Date.now();

try {
// Make actual OpenAI API call
const completion = await openai.chat.completions.create({
model: "gpt-3.5-turbo",
messages: [{ role: "user", content: prompt }],
max_tokens: 150,
});

const endTime = Date.now();
const response =
completion.choices[0]?.message?.content || "No response";
const tokensUsed = completion.usage?.total_tokens || 0;
const responseTime = endTime - startTime;

const interaction = {
user_id: userId,
prompt: prompt,
response: response,
model: "gpt-3.5-turbo",
tokens_used: tokensUsed,
response_time_ms: responseTime,
satisfaction_score: Math.floor(Math.random() * 5) + 1, // Random satisfaction 1-5
timestamp: new Date().toISOString(),
};

interactions.push(interaction);

// Add individual row to dataset
await dataset.addRow(interaction);
} catch (error) {
console.log(
` ⚠️ Error with prompt ${i + 1}: ${error instanceof Error ? error.message : String(error)}`,
);

// Add error interaction data
const errorInteraction = {
user_id: userId,
prompt: prompt,
response: `Error: ${error instanceof Error ? error.message : String(error)}`,
model: "gpt-3.5-turbo",
tokens_used: 0,
response_time_ms: Date.now() - startTime,
satisfaction_score: 1,
timestamp: new Date().toISOString(),
};

interactions.push(errorInteraction);
await dataset.addRow(errorInteraction);
}
}

console.log(`✅ Added ${interactions.length} interaction records\n`);

// 4. Import additional data from CSV
console.log("📊 Importing additional data from CSV...");

const csvData = `user_id,prompt,response,model,tokens_used,response_time_ms,satisfaction_score,timestamp
user_006,"What is React?","React is a JavaScript library for building user interfaces...","gpt-3.5-turbo",85,1200,4,"2024-01-15T10:30:00Z"
user_007,"Explain Docker","Docker is a containerization platform that allows you to package applications...","gpt-3.5-turbo",120,1500,5,"2024-01-15T10:35:00Z"
user_008,"What is GraphQL?","GraphQL is a query language and runtime for APIs...","gpt-3.5-turbo",95,1100,4,"2024-01-15T10:40:00Z"`;

await dataset.fromCSV(csvData, { hasHeader: true });
console.log("✅ Imported 3 additional records from CSV\n");

// 5. Get dataset info
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Statistics retrieval: Switching from getStats() to deriving stats via getRows() and getColumns() is clear but may impact performance with large datasets. Consider paginating or using a dedicated stats method.

console.log("📈 Getting dataset information...");
const rows = await dataset.getRows(); // Get all rows
const allColumns = await dataset.getColumns(); // Get all columns
console.log(` • Total rows: ${rows.length}`);
console.log(` • Total columns: ${allColumns.length}`);
console.log(` • Last updated: ${dataset.updatedAt}\n`);

// 6. Retrieve and analyze some data
console.log("🔍 Analyzing collected data...");
const analysisRows = rows.slice(0, 10); // Get first 10 rows for analysis

if (analysisRows.length > 0) {
console.log(` • Retrieved ${analysisRows.length} rows for analysis`);

// Calculate average satisfaction score
const satisfactionScores = analysisRows
.map((row) => row.data.satisfaction_score as number)
.filter((score) => score != null);

if (satisfactionScores.length > 0) {
const avgSatisfaction =
satisfactionScores.reduce((a, b) => a + b, 0) /
satisfactionScores.length;
console.log(
` • Average satisfaction score: ${avgSatisfaction.toFixed(2)}/5`,
);
}

// Calculate average response time
const responseTimes = analysisRows
.map((row) => row.data.response_time_ms as number)
.filter((time) => time != null);

if (responseTimes.length > 0) {
const avgResponseTime =
responseTimes.reduce((a, b) => a + b, 0) / responseTimes.length;
console.log(
` • Average response time: ${avgResponseTime.toFixed(0)}ms`,
);
}

// Show sample interactions
console.log("\n📋 Sample interactions:");
analysisRows.slice(0, 3).forEach((row, index) => {
console.log(` ${index + 1}. User: "${row.data.prompt}"`);
console.log(
` Response: "${String(row.data.response).substring(0, 80)}..."`,
);
console.log(` Satisfaction: ${row.data.satisfaction_score}/5\n`);
});
}

// 7. Get dataset versions (if any exist)
console.log("📚 Checking dataset versions...");
try {
const versions = await dataset.getVersions();
console.log(` • Total versions: ${versions.total}`);

if (versions.versions.length > 0) {
console.log(" • Available versions:");
versions.versions.forEach((version) => {
console.log(
` - ${version.version} (published: ${version.publishedAt})`,
);
});
} else {
console.log(" • No published versions yet");
}
} catch (error) {
console.log(` ⚠️ Could not retrieve versions: ${error.message}`);
}
Comment on lines +236 to +252
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Type-safe error handling in catch block

error.message on an unknown catch variable can fail TypeScript compilation and at runtime if not an Error.

-    } catch (error) {
-      console.log(`  ⚠️ Could not retrieve versions: ${error.message}`);
+    } catch (error: unknown) {
+      const msg = error instanceof Error ? error.message : String(error);
+      console.log(`  ⚠️ Could not retrieve versions: ${msg}`);
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try {
const versions = await dataset.getVersions();
console.log(` • Total versions: ${versions.total}`);
if (versions.versions.length > 0) {
console.log(" • Available versions:");
versions.versions.forEach((version) => {
console.log(
` - ${version.version} (published: ${version.publishedAt})`,
);
});
} else {
console.log(" • No published versions yet");
}
} catch (error) {
console.log(` ⚠️ Could not retrieve versions: ${error.message}`);
}
try {
const versions = await dataset.getVersions();
console.log(` • Total versions: ${versions.total}`);
if (versions.versions.length > 0) {
console.log(" • Available versions:");
versions.versions.forEach((version) => {
console.log(
` - ${version.version} (published: ${version.publishedAt})`,
);
});
} else {
console.log(" • No published versions yet");
}
} catch (error: unknown) {
const msg = error instanceof Error ? error.message : String(error);
console.log(` ⚠️ Could not retrieve versions: ${msg}`);
}
🤖 Prompt for AI Agents
In packages/sample-app/src/sample_dataset.ts around lines 236 to 252, the catch
block assumes error has a .message which is not type-safe; change the catch
signature to catch (error: unknown) and guard the value before accessing
.message — e.g. if (error instanceof Error) use error.message, otherwise convert
to a string with String(error) (or JSON/stringify/inspect) and include that in
the log so TypeScript compiles and you never access .message on a non-Error.


console.log();

// 8. Publish the dataset
console.log("🚀 Publishing dataset...");
await dataset.publish({
version: "v1.0",
description:
"Initial release of LLM interactions dataset with sample data",
});

console.log(
`✅ Dataset published! Status: ${dataset.published ? "Published" : "Draft"}\n`,
);

// 9. List all datasets (to show our new one)
console.log("📑 Listing all datasets...");
const datasetsList = await client.datasets.list(); // Get all datasets
console.log(` • Found ${datasetsList.total} total datasets`);
console.log(" • Recent datasets:");

datasetsList.datasets.slice(0, 3).forEach((ds, index) => {
const isOurDataset = ds.id === dataset.id;
console.log(
` ${index + 1}. ${ds.name}${isOurDataset ? " ← (just created!)" : ""}`,
);
console.log(` Description: ${ds.description || "No description"}`);
console.log(` Published: ${ds.published ? "Yes" : "No"}\n`);
});

// 10. Demonstrate dataset retrieval
console.log("🔎 Testing dataset retrieval...");
const retrievedDataset = await client.datasets.get(dataset.slug);
if (retrievedDataset) {
console.log(
`✅ Retrieved dataset by slug: ${retrievedDataset.name} (ID: ${retrievedDataset.id})`,
);
} else {
console.log("❌ Could not retrieve dataset");
}

console.log("\n🎉 Dataset API demonstration completed successfully!");
console.log("\n💡 Key features demonstrated:");
console.log(" • Dataset creation and schema definition");
console.log(" • Real-time data collection from LLM interactions");
console.log(" • CSV data import capabilities");
console.log(" • Statistical analysis of collected data");
console.log(" • Dataset publishing and version management");
console.log(" • Search and retrieval operations");

console.log(`\n📊 Dataset Summary:`);
console.log(` • Name: ${dataset.name}`);
console.log(` • ID: ${dataset.id}`);
console.log(` • Published: ${dataset.published ? "Yes" : "No"}`);
console.log(` • Total interactions recorded: ${rows.length}`);
} catch (error) {
console.error("❌ Error in dataset operations:", error.message);
if (error.stack) {
console.error("Stack trace:", error.stack);
}
}
Comment on lines +309 to +313
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Type-safe error handling in top-level try/catch

Avoid assuming error is an Error.

-  } catch (error) {
-    console.error("❌ Error in dataset operations:", error.message);
-    if (error.stack) {
-      console.error("Stack trace:", error.stack);
-    }
+  } catch (error: unknown) {
+    const err = error instanceof Error ? error : new Error(String(error));
+    console.error("❌ Error in dataset operations:", err.message);
+    if (err.stack) {
+      console.error("Stack trace:", err.stack);
+    }
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
console.error("❌ Error in dataset operations:", error.message);
if (error.stack) {
console.error("Stack trace:", error.stack);
}
}
} catch (error: unknown) {
const err = error instanceof Error ? error : new Error(String(error));
console.error("❌ Error in dataset operations:", err.message);
if (err.stack) {
console.error("Stack trace:", err.stack);
}
}
🤖 Prompt for AI Agents
In packages/sample-app/src/sample_dataset.ts around lines 309 to 313, the
top-level catch assumes `error` is an Error and accesses .message/.stack
directly; change the catch to accept unknown, then perform a type check (e.g.,
if (error instanceof Error) { log error.message and error.stack } else { log a
safe stringified representation like String(error) or JSON.stringify(error)
fallback) ) so logging is type-safe and avoids runtime exceptions. Ensure the
catch signature and subsequent logging use the safe checks and fallbacks.

};

// Error handling for the main function
main().catch((error) => {
console.error("💥 Application failed:", error.message);
process.exit(1);
});
Comment on lines +317 to +320
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Type-safe error handling in main().catch

Same issue: error.message on an untyped catch variable.

-main().catch((error) => {
-  console.error("💥 Application failed:", error.message);
-  process.exit(1);
-});
+main().catch((error: unknown) => {
+  const err = error instanceof Error ? error : new Error(String(error));
+  console.error("💥 Application failed:", err.message);
+  process.exit(1);
+});
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
main().catch((error) => {
console.error("💥 Application failed:", error.message);
process.exit(1);
});
main().catch((error: unknown) => {
const err = error instanceof Error ? error : new Error(String(error));
console.error("💥 Application failed:", err.message);
process.exit(1);
});
🤖 Prompt for AI Agents
In packages/sample-app/src/sample_dataset.ts around lines 317 to 320, the catch
handler uses error.message on an untyped catch variable which is unsafe; change
the handler to normalize the caught value (e.g. check if error instanceof Error
and use error.message, otherwise use String(error)) and log that normalized
message (or the full error object) instead of assuming error.message so the code
is type-safe and won't crash on non-Error throws.

Loading
Loading