sampleclean.clean.deduplication

EntityResolution

class EntityResolution extends SampleCleanAlgorithm

This is the base class for attribute deduplication. It implements a basic structure and error handling for the class.

Companion object provides a few common scenarios.

Linear Supertypes
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. EntityResolution
  2. SampleCleanAlgorithm
  3. AnyRef
  4. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new EntityResolution(params: AlgorithmParameters, scc: SampleCleanContext, sampleTableName: String, components: BlockerMatcherSelfJoinSequence)

    params

    algorithm parameters including "attr" and "mergeStrategy". "attr" refers to the attribute name that will be considered for deduplication. "mergeStrategy" refers to a strategy used to resolve a set of duplicated attributes. i.e. pick one between {USA, U.S.A, United States, United States of America, ...}

    Allowed strategies are "mostConcise" and "mostFrequent" mostConcise will pick the shortest String mostFrequent will pick the most common String

    scc

    SampleClean Context

    sampleTableName
    components

    blocker + matcher routine.

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  7. val attr: String

  8. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  9. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  10. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  11. def exec(): Unit

    The execution function of this algorithm

    The execution function of this algorithm

    Definition Classes
    EntityResolutionSampleCleanAlgorithm
  12. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  13. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  14. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  15. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  16. var mergeStrategy: String

  17. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  18. final def notify(): Unit

    Definition Classes
    AnyRef
  19. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  20. def onUpdateNotify(): Unit

    This function is called by the algorithm designer to notify the pipeline that the model has been updated.

    This function is called by the algorithm designer to notify the pipeline that the model has been updated.

    Definition Classes
    SampleCleanAlgorithm
  21. def setCanonicalizationStrategy(strategy: String): Unit

    Sets the canonicalization strategy

  22. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  23. def synchronousExecAndRead(): RDD[Row]

    Definition Classes
    SampleCleanAlgorithm
  24. def toString(): String

    Definition Classes
    AnyRef → Any
  25. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  26. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  27. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from SampleCleanAlgorithm

Inherited from AnyRef

Inherited from Any

Ungrouped