AI & Machine Learning

Embeddings Search: Vector Similarity with SQL Databases

Master embeddings search and vector similarity in PostgreSQL. Learn implementation strategies, performance optimization, and real-world applications for modern AI systems.

· By PropTechUSA AI
11m
Read Time
2.1k
Words
6
Sections
8
Code Examples

The property technology landscape is undergoing a seismic shift. Traditional keyword-based search systems are giving way to sophisticated semantic search capabilities powered by vector embeddings. As PropTech companies grapple with increasingly complex data requirements—from natural language property descriptions to image-based property matching—the ability to implement efficient embeddings search within familiar SQL databases has become a critical competitive advantage.

The Evolution of Search in Property Technology

From Keywords to Semantic Understanding

Traditional property search relied heavily on exact keyword matches and structured filters. A user searching for "cozy downtown apartment" would only find listings containing those exact terms. This approach left significant gaps in search quality and user experience.

Vector embeddings revolutionize this paradigm by converting text, images, and other data types into high-dimensional numerical representations that capture semantic meaning. Properties described as "intimate urban dwelling" and "cozy downtown apartment" now cluster together in vector space, enabling truly intelligent search experiences.

The PostgreSQL Advantage

While specialized vector databases have emerged, PostgreSQL's extension ecosystem—particularly pgvector—provides a compelling alternative. Organizations can leverage existing database infrastructure, maintain ACID compliance, and benefit from familiar SQL operations while adding powerful vector similarity capabilities.

💡
Pro Tip
PostgreSQL with pgvector offers the best of both worlds: battle-tested relational database features combined with cutting-edge vector similarity search.

Real-World Impact in PropTech

At PropTechUSA.ai, we've observed dramatic improvements in search relevance when clients transition from keyword-based to embedding-powered search systems. Property matching accuracy typically improves by 40-60%, while user engagement metrics show corresponding increases.

Understanding Vector Embeddings and Similarity

The Mathematics Behind Embeddings

Vector embeddings transform discrete data into continuous vector spaces where similar items cluster together. For property descriptions, a model like OpenAI's text-embedding-ada-002 converts text into 1536-dimensional vectors that capture semantic relationships.

The key insight is that mathematical operations in vector space correspond to semantic relationships in the original domain. Properties with similar characteristics will have vectors with high cosine similarity, typically measured as:

sql
SELECT

property_id,

description,

1 - (embedding <=> query_embedding) AS similarity_score

FROM properties

ORDER BY embedding <=> query_embedding

LIMIT 10;

Distance Metrics and Their Applications

Different distance metrics serve different use cases in property technology:

  • Cosine Distance: Ideal for text embeddings and property descriptions
  • Euclidean Distance: Better for numerical features and property attributes
  • Dot Product: Useful when embedding magnitudes carry meaning

The choice of distance metric significantly impacts search results. Cosine distance normalizes for vector magnitude, making it robust for text-based property searches where document length shouldn't affect relevance.

Embedding Dimensionality Considerations

Higher-dimensional embeddings capture more nuanced relationships but increase storage and computational requirements. Modern embedding models typically use 768, 1024, or 1536 dimensions. For property data, we've found 1536-dimensional embeddings provide the optimal balance between semantic richness and performance.

⚠️
Warning
Higher dimensionality doesn't always mean better results. Test different embedding models and dimensions with your specific property dataset to find the optimal configuration.

Implementation Strategies with PostgreSQL

Setting Up pgvector

Begin by installing and configuring pgvector in your PostgreSQL instance:

sql
-- Install pgvector extension

CREATE EXTENSION vector;

-- Create table with vector column

CREATE TABLE property_embeddings(

id SERIAL PRIMARY KEY,

property_id INTEGER NOT NULL,

description TEXT,

embedding vector(1536),

created_at TIMESTAMP DEFAULT NOW()

);

-- Create index class="kw">for efficient similarity search

CREATE INDEX ON property_embeddings

USING ivfflat(embedding vector_cosine_ops)

WITH(lists = 100);

Generating and Storing Embeddings

Here's a TypeScript implementation for generating and storing property embeddings:

typescript
import { OpenAI } from &#039;openai&#039;; import { Pool } from &#039;pg&#039;; interface PropertyEmbedding {

propertyId: number;

description: string;

embedding: number[];

}

class EmbeddingService {

private openai: OpenAI;

private db: Pool;

constructor(openaiKey: string, dbConfig: any) {

this.openai = new OpenAI({ apiKey: openaiKey });

this.db = new Pool(dbConfig);

}

class="kw">async generateEmbedding(text: string): Promise<number[]> {

class="kw">const response = class="kw">await this.openai.embeddings.create({

model: &#039;text-embedding-ada-002&#039;,

input: text,

});

class="kw">return response.data[0].embedding;

}

class="kw">async storePropertyEmbedding(property: PropertyEmbedding): Promise<void> {

class="kw">const query =

INSERT INTO property_embeddings(property_id, description, embedding)

VALUES($1, $2, $3)

ON CONFLICT(property_id)

DO UPDATE SET

description = EXCLUDED.description,

embedding = EXCLUDED.embedding,

updated_at = NOW();

;

class="kw">await this.db.query(query, [

property.propertyId,

property.description,

[${property.embedding.join(&#039;,&#039;)}]

]);

}

}

Create robust search functionality that combines vector similarity with traditional filters:

typescript
interface SearchParams {

query: string;

priceRange?: [number, number];

location?: string;

limit?: number;

}

interface SearchResult {

propertyId: number;

description: string;

similarityScore: number;

metadata?: any;

}

class PropertySearchService {

class="kw">async searchSimilarProperties(

params: SearchParams

): Promise<SearchResult[]> {

// Generate embedding class="kw">for search query

class="kw">const queryEmbedding = class="kw">await this.generateEmbedding(params.query);

class="kw">const query =

SELECT

p.id as property_id,

p.description,

p.price,

p.location,

1 - (pe.embedding <=> $1) as similarity_score

FROM properties p

JOIN property_embeddings pe ON p.id = pe.property_id

WHERE($2::int IS NULL OR p.price BETWEEN $2 AND $3)

AND($4::text IS NULL OR p.location ILIKE $4)

ORDER BY pe.embedding <=> $1

LIMIT $5;

;

class="kw">const result = class="kw">await this.db.query(query, [

[${queryEmbedding.join(&#039;,&#039;)}],

params.priceRange?.[0] || null,

params.priceRange?.[1] || null,

params.location ? %${params.location}% : null,

params.limit || 20

]);

class="kw">return result.rows.map(row => ({

propertyId: row.property_id,

description: row.description,

similarityScore: row.similarity_score,

metadata: {

price: row.price,

location: row.location

}

}));

}

}

Hybrid Search Implementation

Combine vector similarity with full-text search for optimal results:

sql
WITH vector_results AS(

SELECT

property_id,

1 - (embedding <=> $1) as vector_score

FROM property_embeddings

ORDER BY embedding <=> $1

LIMIT 100

),

text_results AS(

SELECT

id as property_id,

ts_rank_cd(search_vector, plainto_tsquery($2)) as text_score

FROM properties

WHERE search_vector @@ plainto_tsquery($2)

)

SELECT

p.*,

COALESCE(vr.vector_score, 0) * 0.7 +

COALESCE(tr.text_score, 0) * 0.3 as combined_score

FROM properties p

LEFT JOIN vector_results vr ON p.id = vr.property_id

LEFT JOIN text_results tr ON p.id = tr.property_id

WHERE vr.property_id IS NOT NULL OR tr.property_id IS NOT NULL

ORDER BY combined_score DESC

LIMIT 20;

Performance Optimization and Best Practices

Index Strategy and Configuration

Optimal index configuration is crucial for production performance. The IVFFlat index in pgvector requires careful tuning:

sql
-- For datasets with 1M+ vectors, increase lists parameter

CREATE INDEX property_embeddings_idx ON property_embeddings

USING ivfflat(embedding vector_cosine_ops)

WITH(lists = 1000);

-- Set probes class="kw">for search-time performance tuning

SET ivfflat.probes = 10;

The lists parameter should be approximately sqrt(rows), while probes controls the accuracy-performance tradeoff during search.

Batch Processing and Updates

Implement efficient batch processing for large-scale embedding generation:

typescript
class BatchEmbeddingProcessor {

class="kw">async processBatch(

properties: Array<{id: number, description: string}>,

batchSize: number = 100

): Promise<void> {

class="kw">for (class="kw">let i = 0; i < properties.length; i += batchSize) {

class="kw">const batch = properties.slice(i, i + batchSize);

class="kw">const embeddings = class="kw">await Promise.all(

batch.map(p => this.generateEmbedding(p.description))

);

// Batch insert embeddings

class="kw">const values = batch.map((property, index) =>

(${property.id}, &#039;${property.description}&#039;, &#039;[${embeddings[index].join(&#039;,&#039;)}]&#039;)

).join(&#039;,&#039;);

class="kw">await this.db.query(

INSERT INTO property_embeddings(property_id, description, embedding)

VALUES ${values}

ON CONFLICT(property_id) DO UPDATE SET

embedding = EXCLUDED.embedding,

updated_at = NOW();

);

}

}

}

Monitoring and Maintenance

Implement comprehensive monitoring for embedding search systems:

  • Search latency: Track p95 and p99 response times
  • Index efficiency: Monitor index usage and query plans
  • Embedding quality: Implement A/B testing for embedding models
  • Storage growth: Plan for vector storage requirements
💡
Pro Tip
Regularly analyze search patterns to optimize your embedding strategy. User interaction data provides valuable feedback for model selection and parameter tuning.

Cost Optimization Strategies

Vector storage and computation can be expensive at scale. Consider these optimization strategies:

  • Dimensionality reduction: Use PCA or other techniques for less critical applications
  • Selective embedding: Generate embeddings only for frequently searched content
  • Caching strategies: Cache embeddings for common queries
  • Incremental updates: Update embeddings only when property data changes significantly

Production Deployment and Scaling Considerations

Architecture Patterns

Successful production deployments typically follow one of these patterns:

Integrated Pattern: Store embeddings directly in your main PostgreSQL database alongside property data. This approach simplifies data consistency but may impact performance for high-volume applications. Separated Pattern: Use a dedicated PostgreSQL instance for vector operations while maintaining property data in your primary database. This pattern provides better resource isolation and scaling flexibility. Hybrid Pattern: Implement a combination approach where frequently accessed embeddings live in the main database while archival or specialized embeddings use separate storage.

Scaling Vector Operations

As your property database grows, consider these scaling strategies:

typescript
class ScalableVectorSearch {

class="kw">async searchWithSharding(

query: string,

shardKey?: string

): Promise<SearchResult[]> {

class="kw">const queryEmbedding = class="kw">await this.generateEmbedding(query);

// Distribute search across multiple database shards

class="kw">const shards = shardKey ? [this.getShardForKey(shardKey)] : this.getAllShards();

class="kw">const searchPromises = shards.map(shard =>

this.searchShard(shard, queryEmbedding)

);

class="kw">const results = class="kw">await Promise.all(searchPromises);

// Merge and rank results across shards

class="kw">return this.mergeShardResults(results);

}

private class="kw">async searchShard(

shard: DatabaseShard,

embedding: number[]

): Promise<SearchResult[]> {

class="kw">const query =

SELECT property_id, description, 1 - (embedding <=> $1) as score

FROM property_embeddings

ORDER BY embedding <=> $1

LIMIT 50;

;

class="kw">return class="kw">await shard.query(query, [[${embedding.join(&#039;,&#039;)}]]);

}

}

Ensuring High Availability

Vector search systems require careful attention to availability and disaster recovery:

  • Read replicas: Distribute search load across multiple read replicas
  • Embedding backup: Implement robust backup strategies for vector data
  • Fallback mechanisms: Provide graceful degradation to keyword search when vector search fails
  • Health monitoring: Track embedding freshness and search quality metrics

The integration of embeddings search with SQL databases represents more than a technical upgrade—it's a fundamental shift toward more intelligent, user-centric property technology. Organizations that master this transition will deliver superior user experiences while building more scalable, maintainable systems.

At PropTechUSA.ai, we've seen how proper implementation of vector similarity search transforms not just search quality, but entire product experiences. Users discover properties they wouldn't have found through traditional search, leading to higher engagement and conversion rates.

The key to success lies in thoughtful implementation that balances performance, cost, and user experience. Start with a focused use case, measure results rigorously, and scale based on demonstrated value.

Ready to implement embeddings search in your property technology stack? Begin with a pilot project focusing on your most critical search use case. The combination of PostgreSQL's reliability with modern vector capabilities provides a foundation for building the next generation of property technology platforms.

Connect with our team at PropTechUSA.ai to explore how vector similarity search can transform your property technology platform. We provide comprehensive guidance on implementation strategies, performance optimization, and scaling considerations tailored to your specific requirements.

Need This Built?
We build production-grade systems with the exact tech covered in this article.
Start Your Project
PT
PropTechUSA.ai Engineering
Technical Content
Deep technical content from the team building production systems with Cloudflare Workers, AI APIs, and modern web infrastructure.