Module: Familia::Features::Relationships::Indexing::MultiIndexGenerators

Defined in:
lib/familia/features/relationships/indexing/multi_index_generators.rb

Overview

Generators for multi-value index (1:many) methods

Multi-value indexes use UnsortedSet DataType for grouping objects by field value. Each field value gets its own set of object identifiers.

Example: multi_index :department, :dept_index, within: Company

Generates on Company (destination):

  • company.sample_from_department(dept, count=1)
  • company.find_all_by_department(dept)
  • company.dept_index_for(dept_value)
  • company.rebuild_dept_index

Generates on Employee (self):

  • employee.add_to_company_dept_index(company)
  • employee.remove_from_company_dept_index(company)
  • employee.update_in_company_dept_index(company, old_dept)

Class Method Summary collapse

Class Method Details

.generate_factory_method(scope_class, index_name) ⇒ Object

Generates the factory method ON THE SCOPE CLASS (Company when within: Company):

  • company.index_name_for(field_value) - DataType factory (always needed)

This method is required by mutation methods even when query: false

Parameters:

  • scope_class (Class)

    The scope class providing uniqueness context (e.g., Company)

  • index_name (Symbol)

    Name of the index (e.g., :dept_index)



75
76
77
78
79
80
81
82
83
84
85
86
87
# File 'lib/familia/features/relationships/indexing/multi_index_generators.rb', line 75

def generate_factory_method(scope_class, index_name)
  actual_scope_class = Familia.resolve_class(scope_class)

  actual_scope_class.class_eval do
    # Helper method to get index set for a specific field value
    # This acts as a factory for field-value-specific DataTypes
    define_method(:"#{index_name}_for") do |field_value|
      # Return properly managed DataType instance with parameterized key
      index_key = Familia.join(index_name, field_value)
      Familia::UnsortedSet.new(index_key, parent: self)
    end
  end
end

.generate_mutation_methods_self(indexed_class, field, scope_class, index_name) ⇒ Object

Generates mutation methods ON THE INDEXED CLASS (Employee):

  • employee.add_to_company_dept_index(company)
  • employee.remove_from_company_dept_index(company)
  • employee.update_in_company_dept_index(company, old_dept)

Parameters:

  • indexed_class (Class)

    The class being indexed (e.g., Employee)

  • field (Symbol)

    The field to index (e.g., :department)

  • scope_class (Class)

    The scope class providing uniqueness context (e.g., Company)

  • index_name (Symbol)

    Name of the index (e.g., :dept_index)



268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
# File 'lib/familia/features/relationships/indexing/multi_index_generators.rb', line 268

def generate_mutation_methods_self(indexed_class, field, scope_class, index_name)
  scope_class_config = scope_class.config_name
  indexed_class.class_eval do
    method_name = :"add_to_#{scope_class_config}_#{index_name}"
    Familia.debug("[MultiIndexGenerators] #{name} method #{method_name}")

    define_method(method_name) do |scope_instance|
      return unless scope_instance

      field_value = send(field)
      return unless field_value

      # Use helper method on scope instance instead of manual instantiation
      index_set = scope_instance.send("#{index_name}_for", field_value)

      # Use UnsortedSet DataType method (no scoring)
      index_set.add(identifier)
    end

    method_name = :"remove_from_#{scope_class_config}_#{index_name}"
    Familia.debug("[MultiIndexGenerators] #{name} method #{method_name}")

    define_method(method_name) do |scope_instance|
      return unless scope_instance

      field_value = send(field)
      return unless field_value

      # Use helper method on scope instance instead of manual instantiation
      index_set = scope_instance.send("#{index_name}_for", field_value)

      # Remove using UnsortedSet DataType method
      index_set.remove(identifier)
    end

    method_name = :"update_in_#{scope_class_config}_#{index_name}"
    Familia.debug("[MultiIndexGenerators] #{name} method #{method_name}")

    define_method(method_name) do |scope_instance, old_field_value = nil|
      return unless scope_instance

      new_field_value = send(field)

      # Use Familia's transaction method for atomicity with DataType abstraction
      scope_instance.transaction do |_tx|
        # Remove from old index if provided - use helper method
        if old_field_value
          old_index_set = scope_instance.send("#{index_name}_for", old_field_value)
          old_index_set.remove(identifier)
        end

        # Add to new index if present - use helper method
        if new_field_value
          new_index_set = scope_instance.send("#{index_name}_for", new_field_value)
          new_index_set.add(identifier)
        end
      end
    end
  end
end

.generate_query_methods_destination(indexed_class, field, scope_class, index_name) ⇒ Object

Generates query methods ON THE SCOPE CLASS (Company when within: Company):

  • company.sample_from_department(dept, count=1) - random sampling
  • company.find_all_by_department(dept) - all objects
  • company.rebuild_dept_index - rebuild index

Parameters:

  • indexed_class (Class)

    The class being indexed (e.g., Employee)

  • field (Symbol)

    The field to index (e.g., :department)

  • scope_class (Class)

    The scope class providing uniqueness context (e.g., Company)

  • index_name (Symbol)

    Name of the index (e.g., :dept_index)



98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
# File 'lib/familia/features/relationships/indexing/multi_index_generators.rb', line 98

def generate_query_methods_destination(indexed_class, field, scope_class, index_name)
  # Resolve scope class using Familia pattern
  actual_scope_class = Familia.resolve_class(scope_class)

  # Get scope_class_config for method naming (needed for rebuild methods)
  scope_class_config = actual_scope_class.config_name

  # Generate instance sampling method (e.g., company.sample_from_department)
  actual_scope_class.class_eval do

    define_method(:"sample_from_#{field}") do |field_value, count = 1|
      index_set = send("#{index_name}_for", field_value) # i.e. UnsortedSet

      # Get random members efficiently (O(1) via SRANDMEMBER with count)
      # Returns array even for count=1 for consistent API
      index_set.sample(count).map do |id|
        indexed_class.find_by_identifier(id)
      end
    end

    # Generate bulk query method (e.g., company.find_all_by_department)
    define_method(:"find_all_by_#{field}") do |field_value|
      index_set = send("#{index_name}_for", field_value) # i.e. UnsortedSet

      # Get all members from set
      index_set.members.map { |id| indexed_class.find_by_identifier(id) }
    end

    # Generate method to rebuild the multi-value index for this parent instance
    #
    # Multi-indexes create separate sets for each field value, requiring a three-phase approach:
    # 1. Loading: Load all objects once and cache them (discovers field values simultaneously)
    # 2. Clearing: Remove all existing index sets using SCAN
    # 3. Rebuilding: Rebuild index from cached objects (no reload needed)
    #
    # @param batch_size [Integer] Number of identifiers to process per batch
    # @yield [progress] Optional block called with progress updates
    # @yieldparam progress [Hash] Progress information with keys:
    #   - :phase [Symbol] Current phase (:loading, :clearing, :rebuilding)
    #   - :current [Integer] Current item count
    #   - :total [Integer] Total items (when known)
    #   - :field_value [String] Current field value being processed
    #
    # @example Basic rebuild
    #   company.rebuild_dept_index
    #
    # @example With progress monitoring
    #   company.rebuild_dept_index do |progress|
    #     puts "#{progress[:phase]}: #{progress[:current]}/#{progress[:total]}"
    #   end
    #
    # @example Memory-conscious rebuild for large collections
    #   # Process in smaller batches to reduce memory footprint
    #   company.rebuild_dept_index(batch_size: 50)
    #
    # @note Memory Considerations:
    #   This method caches all objects in memory during rebuild to avoid duplicate
    #   database loads. For very large collections (>100k objects), monitor memory usage
    #   and consider processing in chunks or using a streaming approach if memory
    #   constraints are encountered. The batch_size parameter controls Redis I/O
    #   batching but does not affect memory usage since all objects are cached.
    #
    define_method(:"rebuild_#{index_name}") do |batch_size: 100, &progress_block|
      # PHASE 1: Find the collection containing the indexed objects
      # Look for a participation relationship where indexed_class participates in this scope_class
      collection_name = nil

      # Check if indexed_class has participation to this scope_class
      if indexed_class.respond_to?(:participation_relationships)
        participation = indexed_class.participation_relationships.find do |rel|
          rel.target_class == self.class
        end
        collection_name = participation&.collection_name if participation
      end

      # Get the collection DataType if we found a participation relationship
      collection = collection_name ? send(collection_name) : nil

      if collection
        # PHASE 2: Load objects once and cache them for both discovery and rebuilding
        # This avoids duplicate load_multi calls (previous approach loaded twice)
        progress_block&.call(phase: :loading, current: 0, total: collection.size)

        field_values = Set.new
        cached_objects = []
        processed = 0

        collection.members.each_slice(batch_size) do |identifiers|
          # Load objects in batches - SINGLE LOAD for both phases
          objects = indexed_class.load_multi(identifiers).compact
          cached_objects.concat(objects)

          objects.each do |obj|
            value = obj.send(field)
            # Only track non-nil, non-empty field values
            field_values << value.to_s if value && !value.to_s.strip.empty?
          end

          processed += identifiers.size
          progress_block&.call(phase: :loading, current: processed, total: collection.size)
        end

        # PHASE 3: Clear all existing field-value-specific index sets
        # Use SCAN to find all existing index keys (including orphaned ones from deleted field values)
        progress_block&.call(phase: :clearing, current: 0, total: field_values.size)

        # Get the base pattern for this index by creating a sample index set
        # The "*" creates a wildcard pattern like "company:123:dept_index:*" for SCAN
        sample_index = send(:"#{index_name}_for", "*")
        index_pattern = sample_index.dbkey

        # Find all existing index keys using SCAN
        cleared_count = 0
        dbclient.scan_each(match: index_pattern) do |key|
          dbclient.del(key)
          cleared_count += 1
          progress_block&.call(phase: :clearing, current: cleared_count, total: field_values.size, key: key)
        end

        # PHASE 4: Rebuild index from cached objects (no reload needed)
        progress_block&.call(phase: :rebuilding, current: 0, total: cached_objects.size)

        processed = 0
        cached_objects.each_slice(batch_size) do |objects|
          transaction do |_tx|
            objects.each do |obj|
              # Use the generated add_to method to maintain consistency
              # This ensures the same logic is used as during normal operation
              obj.send(:"add_to_#{scope_class_config}_#{index_name}", self)
            end
          end

          processed += objects.size
          progress_block&.call(phase: :rebuilding, current: processed, total: cached_objects.size)
        end

        Familia.info "[Rebuild] Multi-index #{index_name} rebuilt: #{field_values.size} field values, #{processed} objects"

        processed  # Return count of processed objects

      else
        # No participation relationship found - warn and suggest alternative
        Familia.warn <<~WARNING
          [Rebuild] Cannot rebuild multi-index #{index_name}: no participation relationship found

          Multi-index rebuild requires a participation relationship to find objects.
          Add a participation relationship to #{indexed_class.name}:

            class #{indexed_class.name} < Familia::Horreum
              participates_in #{self.class.name}, :collection_name, score: :field
            end

          Then access the collection via: #{self.class.config_name}.collection_name
        WARNING

        nil
      end
    end
  end
end

.setup(indexed_class:, field:, index_name:, within:, query:) ⇒ Object

Main setup method that orchestrates multi-value index creation

Parameters:

  • indexed_class (Class)

    The class being indexed (e.g., Employee)

  • field (Symbol)

    The field to index

  • index_name (Symbol)

    Name of the index

  • within (Class, Symbol)

    Scope class for instance-scoped index (required)

  • query (Boolean)

    Whether to generate query methods



39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
# File 'lib/familia/features/relationships/indexing/multi_index_generators.rb', line 39

def setup(indexed_class:, field:, index_name:, within:, query:)
  # Multi-index always requires a scope context
  scope_class = within
  resolved_class = Familia.resolve_class(scope_class)

  # Store metadata for this indexing relationship
  indexed_class.indexing_relationships << IndexingRelationship.new(
    field:             field,
    scope_class:       scope_class,
    within:            within,
    index_name:        index_name,
    query:            query,
    cardinality:       :multi,
  )

  # Always generate the factory method - required by mutation methods
  if scope_class.is_a?(Class)
    generate_factory_method(resolved_class, index_name)
  end

  # Generate query methods on the scope class (optional)
  if query && scope_class.is_a?(Class)
    generate_query_methods_destination(indexed_class, field, resolved_class, index_name)
  end

  # Generate mutation methods on the indexed class
  generate_mutation_methods_self(indexed_class, field, resolved_class, index_name)
end