Data Science @ OSU
Data Science @ OSU (DS @ OSU) places powerful computational environments and resources at the fingertips of students and instructors. DS @ OSU is designed to support interactive data science, and provides over 40 programming languages, including Python, R, and Julia, in a streamlined and highly accessible cloud environment. DS @ OSU supports integrated explanations and coding, making data sciences understandable, repeatable, and shareable.
DS @ OSU + Canvas
Integrated with Canvas, DS @ OSU provides seamless access for students to not only get assignments and learning materials, but also direct access to their own personal learning and computing environments, all within a web browser. The integration with Canvas provides faculty and student TA's straightforward access to review and provide feedback on student assignments.
DS @ OSU Benefits
Students learn and code in cutting-edge computing environments through a web browser, providing access to powerful data science services and resources available with nothing to install. DS @ OSU supports students in following instruction, writing, and testing their own code at their own pace and environment.
- Open access to powerful computing environments
- Cloud-based software accessed from your browser
- Integrated with Canvas for seamless learning
- Reduced reliance on classroom computers and labs to complete coursework
FAQ
Instructors can get started with DS @ OSU by contacting the DRI Team.
Students will access DS @ OSU through Canvas.
A Canvas course is available to support instructors using DS @ OSU. Contact DRI to be added to the course.
Based on the datascience-notebook Jupyter Docker Stack, we support:
- JupyterLab, the latest-gen Jupyter interface
- Tools and Interfaces:
- Jupyter notebooks & Python 3
- R, RStudio, and R Shiny
- Julia and bash (command-line)
- A wide array of pre-installed Python and R packages
-
For each Hub (generally we setup one Hub per class), a shared storage space with "classroom" permissions:
- Students can read+write in their own home directories
- Instructors (or other admins such as TAs) can directly browse and edit student data
- A hub_data_share for instructor staging of data and code
-
All users can install scripts and R and Python packages for their own use
-
Instructors can install scripts and R and Python packages for everyone
-
Additional hooks for instructors to customize user environments
-
Automatic login via Canvas, including support for social logins from Canvas Studio Sites
- as Links from Assignments and Modules
- Specific Canvas Roles (e.g. Instructors and/or TAs) determine Admin-level access within a Hub
As much as possible, we've designed DS @ OSU on the principle of making the easy things easy, and the hard things possible. Simultaneously, we've built a system that scales up and down, able to support dozens to hundreds (potentially thousands) of students. Lastly, a number of features and backend support systems are still in development and testing.
Please be especially aware of these items as a user of DS @ OSU:
- Autoscaling Wait Times: Hub access can at times be delayed for several minutes while cloud resources are created on the fly to support them.
- This can often be avoided with planning. We ask that Instructors have a working understanding of computing resources to help manage their class (described in the "Cloud Server Management" sections).
- This can often be avoided with planning. We ask that Instructors have a working understanding of computing resources to help manage their class (described in the "Cloud Server Management" sections).
- Getting Access: Processes for Hub creation and decommissioning are a work in progress.
- Currently, this is handled by a request to DRI.
- Currently, this is handled by a request to DRI.
- Canvas + Hubs: 'Connecting' a Hub to a Canvas course requires some setup steps within the Canvas course settings.
- This can be done by the instructor, or by adding a support person as a "Designer" who can make the connection (and edit course content, but cannot see FERPA-protected information such as grades and discussions).
- This can be done by the instructor, or by adding a support person as a "Designer" who can make the connection (and edit course content, but cannot see FERPA-protected information such as grades and discussions).
- Computation Limits: New! Previously users were limited to 2G RAM and 2 CPU cores, but we are open-testing a time-based quota system for cost-effective use of larger compute resources in the cloud, including GPU compute (with TensorFlow 2 installed).
- Data Storage Limits: Data storage is designed as a single shared space for all users within a single Hub.
- We default to this space being 40 gigabytes but can allocate larger spaces (up to 500G) on request during Hub creation.
- We default to this space being 40 gigabytes but can allocate larger spaces (up to 500G) on request during Hub creation.
- Data Storage Sharing: There are currently no per-user storage limits within the shared space (this is a difficult challenge we're working on). Running out of space will prevent the creation of new files.
- Deleting data frees the space again - we are working on tools for Instructors and TAs to identify "data hogs".
- Deleting data frees the space again - we are working on tools for Instructors and TAs to identify "data hogs".
- Data Retention: Long-term data storage and management is not supported at this time.
- We can create a Hub up to a month prior to a course for instructors to stage data and packages, but require all data to be removed within 2 weeks after the end of a course.
- We can create a Hub up to a month prior to a course for instructors to stage data and packages, but require all data to be removed within 2 weeks after the end of a course.
- Backups and Recovery
- At this time restoring data on hub instances is not available. We're working to enable backups in the near future.