Skip to content

wlevine/daru

This branch is 392 commits behind SciRuby/daru:master.

Folders and files

NameName
Last commit message
Last commit date
Jul 25, 2015
Jul 27, 2015
Jul 27, 2015
Jul 19, 2015
Jun 12, 2015
Oct 6, 2014
Jun 11, 2015
Jun 12, 2015
Jun 12, 2015
Jul 25, 2015
Oct 4, 2014
Jul 13, 2015
Jun 1, 2015
Jul 19, 2015

Repository files navigation

daru

Data Analysis in RUby

Gem Version Build Status

Introduction

daru (Data Analysis in RUby) is a library for storage, analysis, manipulation and visualization of data.

daru is inspired by pandas, a very mature solution in Python.

Written in pure Ruby so should work with all ruby implementations. Tested with MRI 2.0, 2.1, 2.2.

Features

  • Data structures:
    • Vector - A basic 1-D vector.
    • DataFrame - A 2-D spreadsheet-like structure for manipulating and storing data sets. This is daru's primary data structure.
  • Compatible with IRuby notebook, statsample and statsample-glm.
  • Singly and hierarchially indexed data structures.
  • Flexible and intuitive API for manipulation and analysis of data.
  • Easy plotting, statistics and arithmetic.
  • Plentiful iterators.
  • Optional speed and space optimization on MRI with NMatrix and GSL.
  • Easy splitting, aggregation and grouping of data.
  • Quickly reducing data with pivot tables for quick data summary.
  • Import and exports dataset from and to Excel, CSV, Databases and plain text files.

Notebooks

Usage

Case Studies

Blog Posts

Documentation

Docs can be found here.

Roadmap

  • Enable creation of DataFrame by only specifying an NMatrix/MDArray in initialize. Vector naming happens automatically (alphabetic) or is specified in an Array.
  • Basic Data manipulation and analysis operations:
    • DF concat
  • Assignment of a column to a single number should set the entire column to that number.
  • == between daru_vector and string/number.
  • Multiple column assignment with []=
  • Multiple value assignment for vectors with []=.
  • #find_max function which will evaluate a block and return the row for the value of the block is max.
  • Function to check if a value of a row/vector is within a specified range.
  • Create a new vector in map_rows if any of the already present rows dont match the one assigned in the block.
  • Sort by index.
  • Statistics on DataFrame over rows and columns.
  • Cumulative sum.
  • Calculate percentage change.
  • Have some sample data sets for users to play around with. Should be able to load these from the code itself.
  • Sorting with missing data present.
  • Change internals of indexes to raise errors when a particular index is missing and the passed key is a Fixnum. Right now we just return the Fixnum for convienience.

Contributing

Pick a feature from the Roadmap or the issue tracker or think of your own and send me a Pull Request!

Acknowledgements

  • Google and the Ruby Science Foundation for the Google Summer of Code 2015 grant for further developing daru and integrating it with other ruby gems.
  • Thank you last.fm for making user data accessible to the public.

Copyright (c) 2015, Sameer Deshmukh All rights reserved

About

Data Analysis in RUby

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Ruby 99.9%
  • Shell 0.1%