sampleclean.clean.deduplication

ActiveLearningStrategy

case class ActiveLearningStrategy(displayedColNames: List[String], featurizer: Featurizer) extends Product with Serializable

This class is used to create an Active Learning strategy that will asynchronously run an Active Learning algorithm ultimately used for deduplication. It uses given starting labels and Amazon Mechanical Turk for training new models.

displayedColNames

column names of the main data set (i.e. that are visible to the user).

Linear Supertypes
Serializable, Serializable, Product, Equals, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. ActiveLearningStrategy
  2. Serializable
  3. Serializable
  4. Product
  5. Equals
  6. AnyRef
  7. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new ActiveLearningStrategy(displayedColNames: List[String], featurizer: Featurizer)

    displayedColNames

    column names of the main data set (i.e. that are visible to the user).

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  7. def asyncRun(labeledInput: RDD[(String, LabeledPoint)], candidatePairs: RDD[(Row, Row)], colMapper1: (List[String]) ⇒ List[Int], colMapper2: (List[String]) ⇒ List[Int], onUpdateDupCounts: (RDD[(Row, Row)]) ⇒ Unit, passthrough: Boolean = false): ActiveLearningTrainingFuture[SVMModel]

    This method is the main executor of the Active Learning Strategy.

    This method is the main executor of the Active Learning Strategy.

    labeledInput

    initial labels used for training. An empty RDD is valid.

    candidatePairs

    Pairs that will be compared using the crowd. The two data sets being compared can have different column schemas.

    colMapper1

    function that converts a list of column names in the first schema into a list of those columns' indices.

    colMapper2

    function that converts a list of column names in the second schema into a list of those columns' indices.

    onUpdateDupCounts

    link to SampleClean that will update the sample table after the Active Learning algorithm.

  8. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  9. var crowdParameters: CrowdConfiguration

  10. var currentModel: SVMModel

  11. val displayedColNames: List[String]

    column names of the main data set (i.

    column names of the main data set (i.e. that are visible to the user).

  12. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  13. val featurizer: Featurizer

  14. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  15. var frameworkParameters: ActiveLearningParameters

  16. def getActiveLearningParameters: ActiveLearningParameters

    get current framework parameters.

  17. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  18. def getCrowdParameters: CrowdConfiguration

    get current Label-Getter parameters.

  19. def getSVMParameters: SVMParameters

    get current SVM parameters.

  20. def getTaskParameters: CrowdTaskConfiguration

    get current Label-Getter parameters.

  21. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  22. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  23. final def notify(): Unit

    Definition Classes
    AnyRef
  24. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  25. def setActiveLearningParameters(frameworkParameters: ActiveLearningParameters): ActiveLearningStrategy

    Used to set new Active Learning framework parameters that will be used for training.

    Used to set new Active Learning framework parameters that will be used for training.

    frameworkParameters

    parameters to set.

  26. def setCrowdParameters(crowdParams: CrowdConfiguration): ActiveLearningStrategy

    Used to set new crowd parameters that will be used for training.

    Used to set new crowd parameters that will be used for training.

    crowdParams

    parameters to set.

  27. def setSVMParameters(svmParameters: SVMParameters): ActiveLearningStrategy

  28. def setTaskParameters(crowdTaskParams: CrowdTaskConfiguration): ActiveLearningStrategy

    Used to set new crowd task parameters that will be used for training.

    Used to set new crowd task parameters that will be used for training.

    crowdTaskParams

    parameters to set.

  29. var svmParameters: SVMParameters

  30. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  31. var taskParameters: CrowdTaskConfiguration

  32. def updateContext(context: List[String]): Unit

  33. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  34. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  35. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from AnyRef

Inherited from Any

Ungrouped