Data Science

Data Science @ OSU

Data Science @ OSU (DS @ OSU) places powerful computational environments and resources at the fingertips of students and instructors. DS @ OSU is designed to support interactive data science, and provides over 40 programming languages, including Python, R, and Julia, in a streamlined and highly accessible cloud environment. DS @ OSU supports integrated explanations and coding, making data sciences understandable, repeatable, and shareable.

  Help + Support  

  Log in to Canvas

  

DS @ OSU + Canvas

Integrated with Canvas, DS @ OSU provides seamless access for students to not only get assignments and learning materials, but also direct access to their own personal learning and computing environments, all within a web browser. The integration with Canvas provides faculty and student TA's straightforward access to review and provide feedback on student assignments. 

Image
Example of DS @ OSU

DS @ OSU Benefits

Students learn and code in cutting-edge computing environments through a web browser, providing access to powerful data science services and resources available with nothing to install. DS @ OSU supports students in following instruction, writing, and testing their own code at their own pace and environment. 

  • Open access to powerful computing environments
  • Cloud-based software accessed from your browser
  • Integrated with Canvas for seamless learning 
  • Reduced reliance on classroom computers and labs to complete coursework

FAQ

Instructors can get started with DS @ OSU by contacting the DRI Team.

Students will access DS @ OSU through Canvas.

A Canvas course is available to support instructors using DS @ OSU. Contact DRI to be added to the course. 

Based on the datascience-notebook Jupyter Docker Stack, we support:

  • JupyterLab, the latest-gen Jupyter interface
  • Tools and Interfaces:
    • Jupyter notebooks & Python 3
    • R, RStudio, and R Shiny
    • Julia and bash (command-line)
  • A wide array of pre-installed Python and R packages
  • For each Hub (generally we setup one Hub per class), a shared storage space with "classroom" permissions:

    • Students can read+write in their own home directories
    • Instructors (or other admins such as TAs) can directly browse and edit student data
    • A hub_data_share for instructor staging of data and code
  • All users can install scripts and R and Python packages for their own use

  • Instructors can install scripts and R and Python packages for everyone

  • Additional hooks for instructors to customize user environments

  • Automatic login via Canvas, including support for social logins from Canvas Studio Sites

    • as Links from Assignments and Modules
    • Specific Canvas Roles (e.g. Instructors and/or TAs) determine Admin-level access within a Hub

As much as possible, we've designed DS @ OSU on the principle of making the easy things easy, and the hard things possible. Simultaneously, we've built a system that scales up and down, able to support dozens to hundreds (potentially thousands) of students. Lastly, a number of features and backend support systems are still in development and testing. 

Please be especially aware of these items as a user of DS @ OSU:

  • Autoscaling Wait Times: Hub access can at times be delayed for several minutes while cloud resources are created on the fly to support them. 
    • This can often be avoided with planning. We ask that Instructors have a working understanding of computing resources to help manage their class (described in the "Cloud Server Management" sections).
       
  • Getting Access: Processes for Hub creation and decommissioning are a work in progress.
    • Currently, this is handled by a request to DRI. 
       
  • Canvas + Hubs: 'Connecting' a Hub to a Canvas course requires some setup steps within the Canvas course settings.
    • This can be done by the instructor, or by adding a support person as a "Designer" who can make the connection (and edit course content, but cannot see FERPA-protected information such as grades and discussions). 
       
  • Computation Limits: New! Previously users were limited to 2G RAM and 2 CPU cores, but we are open-testing a time-based quota system for cost-effective use of larger compute resources in the cloud, including GPU compute (with TensorFlow 2 installed).
  • Data Storage Limits: Data storage is designed as a single shared space for all users within a single Hub.
    • We default to this space being 40 gigabytes but can allocate larger spaces (up to 500G) on request during Hub creation. 
       
  • Data Storage Sharing: There are currently no per-user storage limits within the shared space (this is a difficult challenge we're working on). Running out of space will prevent the creation of new files.
    • Deleting data frees the space again - we are working on tools for Instructors and TAs to identify "data hogs".
       
  • Data Retention: Long-term data storage and management is not supported at this time.
    • We can create a Hub up to a month prior to a course for instructors to stage data and packages, but require all data to be removed within 2 weeks after the end of a course. 
       
  • Backups and Recovery
    • At this time restoring data on hub instances is not available.  We're working to enable backups in the near future.