Skip to content
Snippets Groups Projects
  1. Apr 20, 2021
  2. Mar 18, 2021
  3. Feb 07, 2021
  4. Jan 15, 2021
  5. Jan 11, 2021
  6. Jan 08, 2021
    • Enrico Guiraud's avatar
      [DF] Fix branch type inference in some cases · f105c495
      Enrico Guiraud authored
      - try harder to find branches and leaves users requested when
        trying to figure out their type
      - when building the list of valid column names, use the output of
        TBranch::GetFullName instead of the names (joined by dots) of
        the branches we traversed so far. With multiple nested
        TBranchElements, the traversal might be `a.b.c` while the name
        returned by TBranch::GetFullName (which is what TTree::GetBranch
        recognizes) might be simply `a.c` (see e.g. ROOT-9975).
      
      These changes fix ROOT-9975, although in some corner cases they might
      change which column names RDataFrame considers valid (any "reasonable"
      user application that was working should keep working -- no tutorial,
      test or integration benchmark we have was broken by these changes).
      f105c495
  7. Dec 01, 2020
  8. Nov 26, 2020
  9. Nov 05, 2020
    • Vincenzo Eduardo Padulano's avatar
      Avoid calling TTree::ChangeFile if file is TMemFile (#6570) · 6e8e5923
      Vincenzo Eduardo Padulano authored
      * Avoid calling TTree::ChangeFile if file is TMemFile
      
      * Add regression tests in TBufferMerger and dataframe_snapshot
      
      * [docs] Update documentation in TTree::Fill and TTree::ChangeFile
      
      * Avoid triggering TTree::ChangeFile mechanism in RDF Snapshot
      
      Create new RAII struct to store and change the value of TTree::GetMaxTreeSize, use it in RLoopManager::Run. Revert when issue #6640 will be closed.
      6e8e5923
  10. Oct 12, 2020
    • Enrico Guiraud's avatar
      [DF] Use a RAII helper to pop/push slots from RSlotStack · 00b6d2c8
      Enrico Guiraud authored
      This prevents certain ugly error messages in case an exception is
      thrown during a multi-thread event loop: before this patch, in that
      case the thread's slot number was never returned to the RSlotStack
      and it could result in some misleading error messages being printed on
      screen.
      00b6d2c8
  11. Sep 28, 2020
    • Enrico Guiraud's avatar
      [DF] Early-quit event loop for RDataSource+Range · 62b1892a
      Enrico Guiraud authored
      RDataSource event loops did not check whether Ranges had been exhausted
      and always looped until the end of the dataset in all cases.
      Event processing was a no-op after Ranges had been exhausted so results
      were correct but runtimes were longer than necessary.
      
      This fixes #6455.
      62b1892a
  12. Sep 18, 2020
  13. Sep 07, 2020
  14. Aug 19, 2020
    • Enrico Guiraud's avatar
      [DF] Simplify reading of RDataSource columns · 654a1dbd
      Enrico Guiraud authored
      Before this commit, RDataSource columns were treated like a special
      kind of Defined columns: they were registered in RBookedCustomColumn
      and their contents were accessed via RCustomColumn::Get.
      
      This commit removes the logic that was Define'ing ad-hoc columns
      corresponding to the RDS columns. Instead, we store the RDS column value
      pointers in a dedicated std::map and teach RDSColumnReader to directly
      use that. Logic is simpler, we avoid an extra function call and an extra
      copy upon data-source value accesses, and we move closer to implementing
      column readers specialized for a given RDS implementation.
      654a1dbd
  15. Aug 13, 2020
  16. Jul 27, 2020
    • Enrico Guiraud's avatar
      [DF] Add all spellings of a branch name to valid column names · 8c025bc7
      Enrico Guiraud authored
      Before this patch, in cases in which t.GetBranch("a.b") and
      t.GetBranch("b") both returned a valid address, RDataFrame was adding
      only "a.b" to the list of valid TTree columns.
      With this patch, both "a.b" and "b" are recognized as valid column names.
      
      We need this change in behavior to avoid a _worse_ change in behavior,
      described in detail in ROOT-10942: since ROOT-10702 was fixed,
      TTree::GetBranch became more powerful and started returning non-null
      addresses for branch names with form "a.b" while it was returning a
      nullptr until v6.20/06. With RDataFrame's previous logic, this in turn
      meant that valid code that was using "a" as a column broke as RDataFrame
      was now adding the "a.b" spelling to the list of valid columns instead.
      
      This fixes the RDF-related part of ROOT-10942: "a" is recognized as a
      valid spelling again, and "a.b" is kept as a new valid spelling.
      8c025bc7
  17. Jun 11, 2020
  18. May 06, 2020
  19. May 04, 2020
  20. Apr 30, 2020
    • Enrico Guiraud's avatar
    • Enrico Guiraud's avatar
      [DF] Fix memleaks of dataframe nodes kept alive for jitted code · 493e7a87
      Enrico Guiraud authored
      Before this patch, if a RLoopManager went out of scope before running
      its event loop (a rare occurrence, but indeed possible, especially in
      interactive sessions), we were leaking certain weak_ptrs and shared_ptrs
      to components of the computation graph, that were meant to be used by
      lazily jitted code that was never actually jitted in the end.
      
      With this change, when _any_ RLoopManager runs the event loop, we
      trigger execution of _all_ jitted code that was registered, even if
      it was registered by an RLoopManager that is now out of scope. In the
      latter case, the code only performs clean-up of the heap-allocated
      weak_ptrs and shared_ptrs and exits. We still "leak" in the odd case
      in which RDF jitted code is registered but the application never actually
      triggers any RDF event loops, in which case the code to be lazily jitted
      remains in the pipeline until the end of the application, and parts of
      the computation graph are kept alive indefinitely by heap-allocated shared_ptrs.
      493e7a87
  21. Apr 29, 2020
  22. Apr 28, 2020
    • Enrico Guiraud's avatar
      [DF] Re-use jitted lambdas with same expression · daabce61
      Enrico Guiraud authored
      This should reduce the time required by jitting dramatically
      in many common scenarios, as we do not need a different template
      specialization of types that are expensive to instantiate for each
      Define or Filter expression.
      daabce61
  23. Apr 27, 2020
    • Enrico Guiraud's avatar
      [DF] Make RLoopManager's code to jit a static global · e879c0e7
      Enrico Guiraud authored
      In the future, we want separate computation graphs to share and re-use already
      jitted lambdas. Without this patch, however, we would have an ordering
      problem or a redefinition problem, because RDF2 might want to re-use/redefine
      a lambda that RDF1 is _going to_ declare, but (since we delay all
      jitting to right before an RDF's event loop) that might happen too late
      for RDF2.
      
      With this change, all RDFs can jit all code that has been booked by all
      other RDFs, resolving that problem. In addition, making fewer but "fatter"
      calls to the interpreter is a performance optimization in itself.
      e879c0e7
  24. Apr 15, 2020
  25. Apr 06, 2020
  26. Mar 27, 2020
  27. Mar 26, 2020
    • Enrico Guiraud's avatar
      [DF] Call CleanUpTask even if event loop throws · 2afd4b10
      Enrico Guiraud authored
      Even if something within the event loop throws, we still need to call
      CleanUpTask to make sure SnapshotHelperMT::FinalizeTask gets called,
      to avoid teardown issues due to input and output TTrees of a Snapshot
      being deleted concurrently.
      2afd4b10
  28. Mar 24, 2020
Loading