Embeddings Search: Vector Similarity with SQL Databases

The property technology landscape is undergoing a seismic shift. Traditional keyword-based search systems are giving way to sophisticated semantic search capabilities powered by vector embeddings. As PropTech companies grapple with increasingly complex data requirements—from natural language property descriptions to image-based property matching—the ability to implement efficient embeddings search within familiar SQL databases has become a critical competitive advantage.

The Evolution of Search in Property Technology

From Keywords to Semantic Understanding

Traditional property search relied heavily on exact keyword matches and structured filters. A user searching for "cozy downtown apartment" would only find listings containing those exact terms. This approach left significant gaps in search quality and user experience.

Vector embeddings revolutionize this paradigm by converting text, images, and other data types into high-dimensional numerical representations that capture semantic meaning. Properties described as "intimate urban dwelling" and "cozy downtown apartment" now cluster together in vector space, enabling truly intelligent search experiences.

The PostgreSQL Advantage

While specialized vector databases have emerged, PostgreSQL's extension ecosystem—particularly pgvector—provides a compelling alternative. Organizations can leverage existing database infrastructure, maintain ACID compliance, and benefit from familiar SQL operations while adding powerful vector similarity capabilities.

💡

Pro Tip

PostgreSQL with pgvector offers the best of both worlds: battle-tested relational database features combined with cutting-edge vector similarity search.

Real-World Impact in PropTech

At PropTechUSA.ai, we've observed dramatic improvements in search relevance when clients transition from keyword-based to embedding-powered search systems. Property matching accuracy typically improves by 40-60%, while user engagement metrics show corresponding increases.

Understanding Vector Embeddings and Similarity

The Mathematics Behind Embeddings

Vector embeddings transform discrete data into continuous vector spaces where similar items cluster together. For property descriptions, a model like OpenAI's text-embedding-ada-002 converts text into 1536-dimensional vectors that capture semantic relationships.

The key insight is that mathematical operations in vector space correspond to semantic relationships in the original domain. Properties with similar characteristics will have vectors with high cosine similarity, typically measured as:

SELECT 
  property_id,
  description,
  1 - (embedding <=> query_embedding) AS similarity_score
FROM properties 
ORDER BY embedding <=> query_embedding

LIMIT 10;

Distance Metrics and Their Applications

Different distance metrics serve different use cases in property technology:

Cosine Distance: Ideal for text embeddings and property descriptions
Euclidean Distance: Better for numerical features and property attributes
Dot Product: Useful when embedding magnitudes carry meaning

The choice of distance metric significantly impacts search results. Cosine distance normalizes for vector magnitude, making it robust for text-based property searches where document length shouldn't affect relevance.

Embedding Dimensionality Considerations

Higher-dimensional embeddings capture more nuanced relationships but increase storage and computational requirements. Modern embedding models typically use 768, 1024, or 1536 dimensions. For property data, we've found 1536-dimensional embeddings provide the optimal balance between semantic richness and performance.

⚠️

Warning

Higher dimensionality doesn't always mean better results. Test different embedding models and dimensions with your specific property dataset to find the optimal configuration.

Implementation Strategies with PostgreSQL

Setting Up pgvector

Begin by installing and configuring pgvector in your PostgreSQL instance:

-- Install pgvector extension
CREATE EXTENSION vector;

-- Create table with vector column
CREATE TABLE property_embeddings(
  id SERIAL PRIMARY KEY,
  property_id INTEGER NOT NULL,
  description TEXT,
  embedding vector(1536),
  created_at TIMESTAMP DEFAULT NOW()
);

-- Create index class="kw">for efficient similarity search
CREATE INDEX ON property_embeddings 
USING ivfflat(embedding vector_cosine_ops) 
WITH(lists = 100);

Generating and Storing Embeddings

Here's a TypeScript implementation for generating and storing property embeddings:

import { OpenAI } from &#039;openai&#039;;
import { Pool } from &#039;pg&#039;;

interface PropertyEmbedding {
  propertyId: number;
  description: string;
  embedding: number[];
}

class EmbeddingService {
  private openai: OpenAI;
  private db: Pool;

  constructor(openaiKey: string, dbConfig: any) {
    this.openai = new OpenAI({ apiKey: openaiKey });
    this.db = new Pool(dbConfig);
  }

  class="kw">async generateEmbedding(text: string): Promise<number[]> {
    class="kw">const response = class="kw">await this.openai.embeddings.create({
      model: &#039;text-embedding-ada-002&#039;,
      input: text,
    });
    
    class="kw">return response.data[0].embedding;
  }

  class="kw">async storePropertyEmbedding(property: PropertyEmbedding): Promise<void> {
    class="kw">const query = 

      INSERT INTO property_embeddings(property_id, description, embedding)
      VALUES($1, $2, $3)
      ON CONFLICT(property_id) 
      DO UPDATE SET 
        description = EXCLUDED.description,
        embedding = EXCLUDED.embedding,
        updated_at = NOW();
    ;
    
    class="kw">await this.db.query(query, [
      property.propertyId,
      property.description,
      [${property.embedding.join(&#039;,&#039;)}]
    ]);
  }

}

Implementing Similarity Search

Create robust search functionality that combines vector similarity with traditional filters:

interface SearchParams {
  query: string;
  priceRange?: [number, number];
  location?: string;
  limit?: number;
}

interface SearchResult {
  propertyId: number;
  description: string;
  similarityScore: number;
  metadata?: any;
}

class PropertySearchService {
  class="kw">async searchSimilarProperties(
    params: SearchParams
  ): Promise<SearchResult[]> {
    // Generate embedding class="kw">for search query
    class="kw">const queryEmbedding = class="kw">await this.generateEmbedding(params.query);
    
    class="kw">const query = 

      SELECT 
        p.id as property_id,
        p.description,
        p.price,
        p.location,
        1 - (pe.embedding <=> $1) as similarity_score
      FROM properties p
      JOIN property_embeddings pe ON p.id = pe.property_id
      WHERE($2::int IS NULL OR p.price BETWEEN $2 AND $3)
        AND($4::text IS NULL OR p.location ILIKE $4)
      ORDER BY pe.embedding <=> $1
      LIMIT $5;
    ;
    
    class="kw">const result = class="kw">await this.db.query(query, [
      [${queryEmbedding.join(&#039;,&#039;)}],
      params.priceRange?.[0] || null,
      params.priceRange?.[1] || null,
      params.location ? %${params.location}% : null,
      params.limit || 20
    ]);
    
    class="kw">return result.rows.map(row => ({
      propertyId: row.property_id,
      description: row.description,
      similarityScore: row.similarity_score,
      metadata: {
        price: row.price,
        location: row.location
      }
    }));
  }

}

Hybrid Search Implementation

Combine vector similarity with full-text search for optimal results:

WITH vector_results AS(
  SELECT 
    property_id,
    1 - (embedding <=> $1) as vector_score
  FROM property_embeddings
  ORDER BY embedding <=> $1
  LIMIT 100
),
text_results AS(
  SELECT 
    id as property_id,
    ts_rank_cd(search_vector, plainto_tsquery($2)) as text_score
  FROM properties
  WHERE search_vector @@ plainto_tsquery($2)
)
SELECT 
  p.*,
  COALESCE(vr.vector_score, 0) * 0.7 + 
  COALESCE(tr.text_score, 0) * 0.3 as combined_score
FROM properties p
LEFT JOIN vector_results vr ON p.id = vr.property_id
LEFT JOIN text_results tr ON p.id = tr.property_id
WHERE vr.property_id IS NOT NULL OR tr.property_id IS NOT NULL
ORDER BY combined_score DESC

LIMIT 20;

Performance Optimization and Best Practices

Index Strategy and Configuration

Optimal index configuration is crucial for production performance. The IVFFlat index in pgvector requires careful tuning:

-- For datasets with 1M+ vectors, increase lists parameter
CREATE INDEX property_embeddings_idx ON property_embeddings 
USING ivfflat(embedding vector_cosine_ops) 
WITH(lists = 1000);

-- Set probes class="kw">for search-time performance tuning

SET ivfflat.probes = 10;

The lists parameter should be approximately sqrt(rows), while probes controls the accuracy-performance tradeoff during search.

Batch Processing and Updates

Implement efficient batch processing for large-scale embedding generation:

class BatchEmbeddingProcessor {
  class="kw">async processBatch(
    properties: Array<{id: number, description: string}>,
    batchSize: number = 100
  ): Promise<void> {
    class="kw">for (class="kw">let i = 0; i < properties.length; i += batchSize) {
      class="kw">const batch = properties.slice(i, i + batchSize);
      class="kw">const embeddings = class="kw">await Promise.all(
        batch.map(p => this.generateEmbedding(p.description))
      );
      
      // Batch insert embeddings
      class="kw">const values = batch.map((property, index) => 
        (${property.id}, &#039;${property.description}&#039;, &#039;[${embeddings[index].join(&#039;,&#039;)}]&#039;)
      ).join(&#039;,&#039;);
      
      class="kw">await this.db.query(

        INSERT INTO property_embeddings(property_id, description, embedding)
        VALUES ${values}
        ON CONFLICT(property_id) DO UPDATE SET
          embedding = EXCLUDED.embedding,
          updated_at = NOW();
      );
    }
  }

}

Monitoring and Maintenance

Implement comprehensive monitoring for embedding search systems:

Search latency: Track p95 and p99 response times
Index efficiency: Monitor index usage and query plans
Embedding quality: Implement A/B testing for embedding models
Storage growth: Plan for vector storage requirements

💡

Pro Tip

Regularly analyze search patterns to optimize your embedding strategy. User interaction data provides valuable feedback for model selection and parameter tuning.

Cost Optimization Strategies

Vector storage and computation can be expensive at scale. Consider these optimization strategies:

Dimensionality reduction: Use PCA or other techniques for less critical applications
Selective embedding: Generate embeddings only for frequently searched content
Caching strategies: Cache embeddings for common queries
Incremental updates: Update embeddings only when property data changes significantly

Production Deployment and Scaling Considerations

Architecture Patterns

Successful production deployments typically follow one of these patterns:

Integrated Pattern: Store embeddings directly in your main PostgreSQL database alongside property data. This approach simplifies data consistency but may impact performance for high-volume applications. Separated Pattern: Use a dedicated PostgreSQL instance for vector operations while maintaining property data in your primary database. This pattern provides better resource isolation and scaling flexibility. Hybrid Pattern: Implement a combination approach where frequently accessed embeddings live in the main database while archival or specialized embeddings use separate storage.

Scaling Vector Operations

As your property database grows, consider these scaling strategies:

class ScalableVectorSearch {
  class="kw">async searchWithSharding(
    query: string,
    shardKey?: string
  ): Promise<SearchResult[]> {
    class="kw">const queryEmbedding = class="kw">await this.generateEmbedding(query);
    
    // Distribute search across multiple database shards
    class="kw">const shards = shardKey ? [this.getShardForKey(shardKey)] : this.getAllShards();
    
    class="kw">const searchPromises = shards.map(shard => 
      this.searchShard(shard, queryEmbedding)
    );
    
    class="kw">const results = class="kw">await Promise.all(searchPromises);
    
    // Merge and rank results across shards
    class="kw">return this.mergeShardResults(results);
  }
  
  private class="kw">async searchShard(
    shard: DatabaseShard,
    embedding: number[]
  ): Promise<SearchResult[]> {
    class="kw">const query = 

      SELECT property_id, description, 1 - (embedding <=> $1) as score
      FROM property_embeddings
      ORDER BY embedding <=> $1
      LIMIT 50;
    ;
    
    class="kw">return class="kw">await shard.query(query, [[${embedding.join(&#039;,&#039;)}]]);
  }

}

Ensuring High Availability

Vector search systems require careful attention to availability and disaster recovery:

Read replicas: Distribute search load across multiple read replicas
Embedding backup: Implement robust backup strategies for vector data
Fallback mechanisms: Provide graceful degradation to keyword search when vector search fails
Health monitoring: Track embedding freshness and search quality metrics

Transforming Property Technology with Intelligent Search

The integration of embeddings search with SQL databases represents more than a technical upgrade—it's a fundamental shift toward more intelligent, user-centric property technology. Organizations that master this transition will deliver superior user experiences while building more scalable, maintainable systems.

At PropTechUSA.ai, we've seen how proper implementation of vector similarity search transforms not just search quality, but entire product experiences. Users discover properties they wouldn't have found through traditional search, leading to higher engagement and conversion rates.

The key to success lies in thoughtful implementation that balances performance, cost, and user experience. Start with a focused use case, measure results rigorously, and scale based on demonstrated value.

Ready to implement embeddings search in your property technology stack? Begin with a pilot project focusing on your most critical search use case. The combination of PostgreSQL's reliability with modern vector capabilities provides a foundation for building the next generation of property technology platforms.

Connect with our team at PropTechUSA.ai to explore how vector similarity search can transform your property technology platform. We provide comprehensive guidance on implementation strategies, performance optimization, and scaling considerations tailored to your specific requirements.