Skip to content
Snippets Groups Projects
Commit 1706ebf3 authored by Ivan Kabadzhov's avatar Ivan Kabadzhov Committed by Ivan Kabadzhov
Browse files

[skip-ci][DF] Update `RCsvDS` docs

parent a53d4fec
No related branches found
No related tags found
No related merge requests found
......@@ -67,6 +67,7 @@ Please use their non-experimental counterparts `ROOT::TBufferMerger` and `ROOT::
- Fix the node counter of [`SaveGraph`](https://root.cern/doc/master/namespaceROOT_1_1RDF.html#ac06a36e745255fb8744b1e0a563074c9), where previously `cling` was getting wrong static initialization.
- Fix [`Graph`](https://root.cern/doc/master/classROOT_1_1RDF_1_1RInterface.html#a1ca9a94bece4767cac82968910afa02e) action (that fills a TGraph object) to properly handle containers and non-container types.
- The [`RCsvDS`](https://root.cern.ch/doc/master/classROOT_1_1RDF_1_1RCsvDS.html) class now allows users to specify column types, and can properly read empty entries of csv files.
## Histogram Libraries
......
// Author: Enric Tejedor CERN 10/2017
/*************************************************************************
* Copyright (C) 1995-2017, Rene Brun and Fons Rademakers. *
* Copyright (C) 1995-2022, Rene Brun and Fons Rademakers. *
* All rights reserved. *
* *
* For the licensing terms see $ROOTSYS/LICENSE. *
......@@ -99,6 +99,9 @@ public:
/// (default `true`).
/// \param[in] delimiter Delimiter character (default ',').
/// \param[in] linesChunkSize bunch of lines to read, use -1 to read all
/// \param[in] colTypes Allow user to specify custom column types, accepts an unordered map with keys being
/// column type, values being type alias ('O' for boolean, 'D' for double, 'L' for
/// Long64_t, 'T' for std::string)
RDataFrame MakeCsvDataFrame(std::string_view fileName, bool readHeaders = true, char delimiter = ',',
Long64_t linesChunkSize = -1LL, std::unordered_map<std::string, char> &&colTypes = {});
......
// Author: Enric Tejedor CERN 10/2017
/*************************************************************************
* Copyright (C) 1995-2017, Rene Brun and Fons Rademakers. *
* Copyright (C) 1995-2022, Rene Brun and Fons Rademakers. *
* All rights reserved. *
* *
* For the licensing terms see $ROOTSYS/LICENSE. *
......@@ -16,19 +16,22 @@
The RCsvDS class implements a CSV file reader for RDataFrame.
A RDataFrame that reads from a CSV file can be constructed using the factory method
ROOT::RDF::MakeCsvDataFrame, which accepts three parameters:
ROOT::RDF::MakeCsvDataFrame, which accepts five parameters:
1. Path to the CSV file.
2. Boolean that specifies whether the first row of the CSV file contains headers or
not (optional, default `true`). If `false`, header names will be automatically generated as Col0, Col1, ..., ColN.
3. Delimiter (optional, default ',').
The types of the columns in the CSV file are automatically inferred. The supported
types are:
- Integer: stored as a 64-bit long long int.
- Floating point number: stored with double precision.
- Boolean: matches the literals `true` and `false`.
4. Chunk size (optional, default is -1 to read all) - number of lines to read at a time
5. Column Types (optional, default is an empty map). A map with column names as keys and their type
(expressed as a single character, see below) as values.
The type of columns that do not appear in the map is inferred from the data.
The supported types are:
- Integer: stored as a 64-bit long long int; can be specified in the column types map with 'L'.
- Floating point number: stored with double precision; specified with 'D'.
- Boolean: matches the literals `true` and `false`; specified with 'O'.
- String: stored as an std::string, matches anything that does not fall into any of the
previous types.
previous types; specified with 'T'.
These are some formatting rules expected by the RCsvDS implementation:
- All records must have the same number of fields, in the same order.
......@@ -68,6 +71,10 @@ double-quote characters must be represented by a pair of double-quote characters
The current implementation of RCsvDS reads the entire CSV file content into memory before
RDataFrame starts processing it. Therefore, before creating a CSV RDataFrame, it is
important to check both how much memory is available and the size of the CSV file.
RCsvDS can handle empty cells and also allows the usage of the special keywords "NaN" and "nan" to
indicate `nan` values. If the column is of type double, these cells are stored internally as `nan`.
Empty cells and explicit `nan`-s inside columns of type Long64_t/bool are stored as zeros.
*/
// clang-format on
......@@ -318,6 +325,9 @@ size_t RCsvDS::ParseValue(const std::string &line, std::vector<std::string> &col
/// (default `true`).
/// \param[in] delimiter Delimiter character (default ',').
/// \param[in] linesChunkSize bunch of lines to read, use -1 to read all
/// \param[in] colTypes Allows users to manually specify column types. Accepts an unordered map with keys being
/// column names, values being type specifiers ('O' for boolean, 'D' for double, 'L' for
/// Long64_t, 'T' for std::string)
RCsvDS::RCsvDS(std::string_view fileName, bool readHeaders, char delimiter, Long64_t linesChunkSize,
std::unordered_map<std::string, char> &&colTypes)
: fReadHeaders(readHeaders), fCsvFile(ROOT::Internal::RRawFile::Create(fileName)), fDelimiter(delimiter),
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment