sampleclean.clean.deduplication.join

BlockerMatcherSelfJoinSequence

class BlockerMatcherSelfJoinSequence extends Serializable

This class acts as a wrapper for blocker + matcher routines. This class has two constructors, requiring a blocker + List[matchers] or a similarity join + List[Matchers]. We treat a similarity join as a combination blocking and matching sequence.

We call this the "BlockerMatcherSelfJoinSequence" because in this class we apply the operation to the same sample.

Linear Supertypes
Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. BlockerMatcherSelfJoinSequence
  2. Serializable
  3. Serializable
  4. AnyRef
  5. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new BlockerMatcherSelfJoinSequence(scc: SampleCleanContext, sampleTableName: String, simjoin: SimilarityJoin, matchers: List[Matcher])

    Create a BlockerMatcherSelfJoinSequence based on a similarity join and a list of matchers.

    Create a BlockerMatcherSelfJoinSequence based on a similarity join and a list of matchers.

    scc

    SampleClean Context

    sampleTableName
    simjoin

    Similarity Join

    matchers

    Because the Similarity Join should contain a matching step, this parameter commonly refers to a matcher that matches all pairs such as: sampleclean.clean.deduplication.matcher.AllMatcher or to an asynchronous matcher such as sampleclean.clean.deduplication.matcher.ActiveLearningMatcher

  2. new BlockerMatcherSelfJoinSequence(scc: SampleCleanContext, sampleTableName: String, blocker: Blocker, matchers: List[Matcher])

    scc

    SampleClean Context

    sampleTableName
    blocker
    matchers

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. def addMatcher(matcher: Matcher): Unit

    Adds a new matcher to the matcher list

    Adds a new matcher to the matcher list

    matcher

  7. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  8. def blockAndMatch(data: RDD[Row]): RDD[(Row, Row)]

    Executes the algorithm.

  9. def changeSimilarity(newSimilarity: String): Unit

    This function changes the similarity metric used in the Entity Resolution algorithm

  10. def changeTokenization(newTokenization: String): Unit

  11. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  12. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  13. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  14. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  15. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  16. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  17. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  18. var matchers: List[Matcher]

  19. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  20. final def notify(): Unit

    Definition Classes
    AnyRef
  21. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  22. def printPipeline(): Unit

  23. def setOnReceiveNewMatches(func: (RDD[(Row, Row)]) ⇒ Unit): Unit

    Set a function that takes some action based on new results.

    Set a function that takes some action based on new results. This needs to be done if there is an asynchronous matcher at the end of the sequence.

  24. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  25. def toString(): String

    Definition Classes
    AnyRef → Any
  26. def updateContext(newContext: List[String]): Unit

  27. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  28. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  29. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped