Scoring Algorithm

The KnowledgePulse scoring engine evaluates reasoning traces using a composite formula that combines four independent quality dimensions. In Phase 2, the engine introduces domain-specific weight profiles that tailor the scoring emphasis to different task domains, and enforces a 100ms performance budget per evaluation.

Composite Formula

The overall quality score is computed as a weighted sum of four normalized dimensions:

score = C × wC + N × wN + D × wD + O × wO

Where:

Symbol	Dimension	Range
C	Complexity	0.0 -- 1.0
N	Novelty	0.0 -- 1.0
D	Tool Diversity	0.0 -- 1.0
O	Outcome Confidence	0.0 -- 1.0

The weights (wC, wN, wD, wO) vary by domain. They always sum to 1.0.

Domain-Specific Weight Profiles

Different task domains prioritize different quality signals. A finance trace benefits most from high outcome confidence, while a coding trace benefits from diverse tool usage. The scoring engine selects the weight profile automatically based on metadata.task_domain.

Available Profiles

Domain	wC (Complexity)	wN (Novelty)	wD (Tool Diversity)	wO (Outcome)
default	0.25	0.35	0.15	0.25
finance	0.20	0.25	0.10	0.45
code	0.20	0.30	0.30	0.20
medical	0.15	0.20	0.10	0.55
customer_service	0.20	0.30	0.20	0.30

Design Rationale

Finance heavily weights outcome confidence because financial analysis demands accurate, verifiable conclusions.
Code heavily weights tool diversity because effective coding agents leverage multiple tools (linters, type checkers, test runners).
Medical has the highest outcome confidence weight (0.55) because correctness is critical in medical reasoning.
Customer service balances novelty and outcome confidence, rewarding creative yet effective problem resolution.

Using Domain Profiles

Domain selection happens automatically through the trace metadata:

import { evaluateValue } from "@knowledgepulse/sdk";
import type { ReasoningTrace } from "@knowledgepulse/sdk";

const trace: ReasoningTrace = {
  "@context": "https://knowledgepulse.dev/schema/v1",
  "@type": "ReasoningTrace",
  id: "kp:trace:finance-demo-001",
  metadata: {
    created_at: new Date().toISOString(),
    task_domain: "finance", // ← selects finance weight profile
    success: true,
    quality_score: 0,
    visibility: "network",
    privacy_level: "aggregated",
  },
  task: { objective: "Analyze TSMC Q4 earnings report" },
  steps: [
    { step_id: 0, type: "thought", content: "Extracting revenue and margin data" },
    { step_id: 1, type: "tool_call", tool: { name: "financial_data_api" }, input: { ticker: "TSM" } },
    { step_id: 2, type: "observation", content: "Revenue: $26.3B, up 14.3% YoY" },
    { step_id: 3, type: "tool_call", tool: { name: "comparison_tool" }, input: { metric: "gross_margin" } },
    { step_id: 4, type: "observation", content: "Gross margin 57.9%, above industry average" },
  ],
  outcome: {
    result_summary: "Strong quarterly performance driven by AI chip demand",
    confidence: 0.92,
  },
};

const score = await evaluateValue(trace);
// With finance weights, high outcome confidence (0.92) contributes more
console.log(score); // e.g. 0.78

If the domain does not match any registered profile, the default weights are used. Unknown domains are silently handled -- no error is thrown.

Rule-Based Overrides

After computing the weighted composite score, three deterministic overrides are applied in order:

1. Single-Step Penalty

if (steps.length === 1 && steps[0].type === "thought") score = 0.1;

A trace with a single thought-only step has minimal knowledge value. The score is forced to 0.1 regardless of other factors.

2. Error-Recovery Bonus

if (errorRecovery > 2 && metadata.success) score = Math.min(1.0, score + 0.1);

Traces that recover from more than 2 errors and still succeed demonstrate valuable resilience. A +0.1 bonus is added, capped at 1.0.

3. Zero-Diversity Penalty

if (uniqueTools <= 1 && steps.some(s => s.tool)) score = Math.max(0.0, score - 0.1);

If a trace uses tools but only one unique tool, a -0.1 penalty is applied, floored at 0.0. This encourages diverse tool usage.

note

The single-step penalty takes precedence. If a trace has exactly one thought step, the score is set to 0.1 first. The error-recovery bonus and zero-diversity penalty then apply on top of that value if their conditions are also met.

Temporal Decay for Novelty

The novelty dimension uses embedding-based similarity against a local vector cache. As the cache accumulates traces over time, the novelty score for semantically similar traces naturally decreases. This creates an implicit temporal decay effect:

Fresh trace in an empty cache: novelty defaults to 0.5.
New unique trace: novelty approaches 1.0 (low similarity to existing vectors).
Repeated trace pattern: novelty approaches 0.0 (high similarity to cached vectors).

The vector cache supports TTL-based eviction (introduced in Phase 2), so cached entries expire after a configurable time window. This ensures that a topic revisited after the TTL period regains a higher novelty score.

import { VectorCache } from "@knowledgepulse/sdk";

const cache = new VectorCache({
  maxElements: 1000,
  dimensions: 384,
  ttlMs: 3600000, // 1 hour — entries expire after this
});

Performance Budget

The scoring function is designed to complete within 100ms for typical traces. Key implementation choices that support this constraint:

Component	Strategy	Latency
Vector cache	Brute-force linear scan over 1,000 vectors	< 1ms
Embedder	Lazy-loaded, cached after first invocation	~50ms first call, ~5ms subsequent
Composite calculation	Pure arithmetic, no I/O	< 0.1ms
Rule overrides	Three conditional checks	< 0.01ms

If the optional embedder (@huggingface/transformers) is not installed, novelty defaults to 0.5 and the entire evaluation runs in under 1ms.

Scoring Interface

interface ScoringWeights {
  complexity: number;
  novelty: number;
  toolDiversity: number;
  outcomeConfidence: number;
}

function evaluateValue(trace: ReasoningTrace): Promise<number>;

The function returns a Promise<number> between 0.0 and 1.0. It is stateful across invocations within the same process because the local vector cache persists for novelty calculations.

Example: Comparing Domain Profiles

The same trace evaluated under different domains produces different scores due to weight differences:

// Same trace structure, different task_domain values
const domains = ["default", "finance", "code", "medical", "customer_service"];

for (const domain of domains) {
  const trace = createTrace({ task_domain: domain });
  const score = await evaluateValue(trace);
  console.log(`${domain}: ${score.toFixed(3)}`);
}

// Example output (varies by trace content):
// default:          0.623
// finance:          0.714  (high confidence rewarded)
// code:             0.598  (tool diversity emphasized)
// medical:          0.751  (confidence dominates)
// customer_service: 0.645  (balanced)

Composite Formula​

Domain-Specific Weight Profiles​

Available Profiles​

Design Rationale​

Using Domain Profiles​

Rule-Based Overrides​

1. Single-Step Penalty​

2. Error-Recovery Bonus​

3. Zero-Diversity Penalty​

Temporal Decay for Novelty​

Performance Budget​

Scoring Interface​

Example: Comparing Domain Profiles​