Class RandomSplitDataSet

  • All Implemented Interfaces:
    DataSet

    public class RandomSplitDataSet
    extends Object
    implements DataSet
    This class implements the DataSet interface by random splitting the collaborative filtering ratings allocated in a text file. Each line of the ratings file must have the following format:
    <userId><separator><itemId><separator><rating>

    Where <separator> is an special character that delimits ratings fields (semicolon by default).

    Training and test ratings are selected randomly by the probability of an user and an item to belong to the test set.

    • Constructor Detail

      • RandomSplitDataSet

        public RandomSplitDataSet​(String filename)
                           throws IOException
        Generates a DataSet form a text file. The DataSet is loaded without test items and test users.
        Parameters:
        filename - File with the ratings.
        Throws:
        IOException - When the file is not accessible by the system with read permissions.
      • RandomSplitDataSet

        public RandomSplitDataSet​(String filename,
                                  double testUsersPercent,
                                  double testItemsPercent)
                           throws IOException
        Generates a DataSet form a text file. The DataSet is loaded with a specific percentage of test items and test users.
        Parameters:
        filename - File with the ratings.
        testUsersPercent - Percentage of users that will be of test.
        testItemsPercent - Percentage of items that will be of test.
        Throws:
        IOException - When the file is not accessible by the system with read permissions.
      • RandomSplitDataSet

        public RandomSplitDataSet​(String filename,
                                  double testUsersPercent,
                                  double testItemsPercent,
                                  long seed)
                           throws IOException
        Generates a DataSet form a text file. The DataSet is loaded with a specific percentage of test items and test users. This constructor allows to define an specific random seed to ensure the reproducibility of the experiments.
        Parameters:
        filename - File with the ratings.
        testUsersPercent - Percentage of users that will be of test.
        testItemsPercent - Percentage of items that will be of test.
        seed - Seed applied to the random number generator.
        Throws:
        IOException - When the file is not accessible by the system with read permissions.
      • RandomSplitDataSet

        public RandomSplitDataSet​(String filename,
                                  double testUsersPercent,
                                  double testItemsPercent,
                                  String separator)
                           throws IOException
        Generates a DataSet form a text file. The DataSet is loaded with a specific percentage of test items and test users.
        Parameters:
        filename - File with the ratings.
        testUsersPercent - Percentage of users that will be of test.
        testItemsPercent - Percentage of items that will be of test.
        separator - Separator char between file fields.
        Throws:
        IOException - When the file is not accessible by the system with read permissions.
      • RandomSplitDataSet

        public RandomSplitDataSet​(String filename,
                                  String separator)
                           throws IOException
        Generates a DataSet form a text file. The DataSet is loaded without test items and test users.
        Parameters:
        filename - File with the ratings.
        separator - Separator char between file fields.
        Throws:
        IOException - When the file is not accessible by the system with read permissions.
      • RandomSplitDataSet

        public RandomSplitDataSet​(String filename,
                                  double testUsersPercent,
                                  double testItemsPercent,
                                  String separator,
                                  long seed)
                           throws IOException
        Generates a DataSet form a text file. The DataSet is loaded with a specific percentage of test items and test users. This constructor allows to define an specific random seed to ensure the reproducibility of the experiments.
        Parameters:
        filename - File with the ratings.
        testUsersPercent - Percentage of users that will be of test.
        testItemsPercent - Percentage of items that will be of test.
        seed - Seed applied to the random number generator.
        separator - Separator char between file fields.
        Throws:
        IOException - When the file is not accessible by the system with read permissions.
    • Method Detail

      • getRatingsIterator

        public Iterator<DataSetEntry> getRatingsIterator()
        Description copied from interface: DataSet
        This method generates an iterator to navigate through the raw ratings stored in DataSetEntries.
        Specified by:
        getRatingsIterator in interface DataSet
        Returns:
        Iterator of ratings
      • getTestRatingsIterator

        public Iterator<DataSetEntry> getTestRatingsIterator()
        Description copied from interface: DataSet
        This method generates an iterator to navigate through the raw test ratings stored in DataSetEntries.
        Specified by:
        getTestRatingsIterator in interface DataSet
        Returns:
        Iterator of test ratings
      • getNumberOfRatings

        public int getNumberOfRatings()
        Description copied from interface: DataSet
        This method indicates the number of (training) ratings.
        Specified by:
        getNumberOfRatings in interface DataSet
        Returns:
        Number of (training) ratings
      • getNumberOfTestRatings

        public int getNumberOfTestRatings()
        Description copied from interface: DataSet
        This method indicates the number of test ratings.
        Specified by:
        getNumberOfTestRatings in interface DataSet
        Returns:
        Number of test ratings