sampleclean.clean.deduplication

EntityResolution

object EntityResolution

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. EntityResolution
  2. AnyRef
  3. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  7. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  8. def createCrowdMatcher(scc: SampleCleanContext, attr: String, sampleName: String): ActiveLearningMatcher

  9. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  10. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  11. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  12. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  13. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  14. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  15. def longAttributeCanonicalize(scc: SampleCleanContext, sampleName: String, attribute: String, threshold: Double = 0.9, weighting: Boolean = true): EntityResolution

    This method builds an Entity Resolution algorithm that will resolve automatically.

    This method builds an Entity Resolution algorithm that will resolve automatically. It uses several default values and is designed for simple Entity Resolution tasks. For more flexibility in parameters (such as setting a Similarity Featurizer and Tokenizer), refer to the EntityResolution class.

    This algorithm uses the Weighted Jaccard Similarity for pairwise comparisons and a word tokenizer.

    scc

    SampleClean Context

    sampleName
    attribute

    name of attribute to resolve

    threshold

    threshold used in the algorithm. Must be between 0.0 and 1.0

    weighting

    If set to true, the algorithm will automatically calculate token weights. Default token weights are defined based on token idf values.

    Adding weights into the join might lead to more reliable pair comparisons and speed up the algorithm if there is an abundance of common words in the dataset.

  16. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  17. final def notify(): Unit

    Definition Classes
    AnyRef
  18. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  19. def shortAttributeCanonicalize(scc: SampleCleanContext, sampleName: String, attribute: String, threshold: Double = 0.9, weighting: Boolean = true): EntityResolution

    This method builds an Entity Resolution algorithm that will resolve automatically.

    This method builds an Entity Resolution algorithm that will resolve automatically. It uses several default values and is designed for simple Entity Resolution tasks. For more flexibility in parameters (such as setting a Similarity Featurizer and Tokenizer), refer to the EntityResolution class.

    This algorithm uses the Edit Distance Similarity for pairwise comparisons and a word tokenizer.

    scc

    SampleClean Context

    sampleName
    attribute

    name of attribute to resolve

    threshold

    threshold used in the algorithm. Must be between 0.0 and 1.0

    weighting

    If set to true, the algorithm will automatically calculate token weights. Default token weights are defined based on token idf values.

    Adding weights into the join might lead to more reliable pair comparisons and speed up the algorithm if there is an abundance of common words in the dataset.

  20. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  21. def textAttributeActiveLearning(scc: SampleCleanContext, sampleName: String, attribute: String, threshold: Double = 0.9, weighting: Boolean = true): EntityResolution

    This method builds an Entity Resolution algorithm that will resolve asynchronously.

    This method builds an Entity Resolution algorithm that will resolve asynchronously. It uses several default values and is designed for simple Entity Resolution tasks. For more flexibility in parameters (such as setting a Similarity Featurize, Tokenizer and Active Learning Strategy), refer to the EntityResolution class.

    This algorithm uses the Jaccard Similarity for pairwise filtering and sim measures Levenshtein and JaroWinkler for featurization.

    The algorithm also uses a word tokenizer.

    scc

    SampleClean Context

    sampleName
    attribute

    name of attribute to resolve

    threshold

    threshold used in the algorithm. Must be between 0.0 and 1.0

    weighting

    If set to true, the algorithm will automatically calculate token weights. Default token weights are defined based on token idf values.

    Adding weights into the join might lead to more reliable pair comparisons and speed up the algorithm if there is an abundance of common words in the dataset.

  22. def toString(): String

    Definition Classes
    AnyRef → Any
  23. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  24. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  25. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from AnyRef

Inherited from Any

Ungrouped