Data Science with Bruno dos Santos
Bruno dos Santos began attending PyNights presented by Booz Allen Hamilton at theClubhou.se when he was still a self-described “n00b Data Scientist” and won! “It gave me so much confidence and helped drive my learning,” Bruno said. To be honest, he won three in a row. “There was one close one where I won in the last five minutes,” said Bruno, continuing, “I experienced an adrenaline rush that never really went away.” Bruno loved the PyNight competition, though he noticed when people got stuck on certain problems, there were not many data scientists in attendance to help, and he found they could have solved their problems with a little preparation. “I saw a need for a meetup that could prepare someone for PyNight,” Bruno shared, “and develop a small community of data scientists.”
Bruno approached theClubhou.se to serve as host for a Data Science meetup specifically using Python. Bruno explained, “I chose Python over R or Julia because it is a top-tier language with strong data science, web development, and general programming communities.” With some discussion about content, presentation, and expectations, a start date of March 2020 was established. Bruno and Co. were able to have one face-to-face meeting before shifting to a fully virtual format.
The first few meetups had one-off lessons, providing insight to particular ways of applying python to data science. As a core group of returning attendees formed, a more focused approach was desired to collectively move through learning more about the basics of Python. Bruno opted to use the Python Data Science Handbook. “It is a well-structured book with accompanying code that checks off the major areas of data science,” explained Bruno, adding, “It is freely available online for reference and has links to Google Colab notebooks where anyone can practice.” This new format of following alongside a book and doing exercises increased engagement from both beginners and experienced programmers. Bruno shared, “It guided our conversations. Some days, the material was challenging, and we would slowly go through each section and discuss it. It was most helpful when we learned about PCA because the images showed how feature spaces and dimensionality reduction work in ways I couldn’t explain with words alone.”
This meetup is the result of Bruno’s passion for data science permeating many aspects of his life. He works at a local startup called Jinfiniti Precision Medicine incubated at the Medical College of Georgia. Bruno and the team at Jinfiniti develop assays to measure molecules in blood (biomarkers) to guide people to better health. “I’ve worked on clustering and classifying cancer types using RNA-seq data,” explained Brunio, “ and I am actively developing our website with Django which includes data visualiztion, and in the near future I will be analyzing data from our clinical trials to determine if certain interventions have significant and strong effects on our biomarkers.” Clustering, classification, and data visualization have all been covered in the data science meetup over the past ten months, helping to illustrate the real-world application of this programming language and its processes.
Over the previous year, the most engaging topic was Pandas, a package for data manipulation which allows it to be used for data science. After reading about it in the book, the attendees practiced solving small problems individually. Everyone had an opportunity to code and share their solutions, enabling them to discuss the cleanest solutions and go in depth about technical aspects of Python and Pandas. Bruno said, “It worked well because we had a group practicing together and correcting each other’s mistakes. It’s only in practice where you discover your weaknesses, and you learn to correct them when you get immediate feedback.”
Coming up next, the data science club will explore Kaggle. They have been using Kaggle as their programming environment because it provides hassle-free access to the Python language and its packages. Bruno says that Kaggle has open-ended data science competitions with a strong community, elaborating, “My plan is to introduce our meetup to these competitions with quick overviews and approaches to solve them. Then, I want our attendees to try to solve them themselves, ask questions whenever they are stuck, share their solutions, and generate discussions. For everyone to get the most out of it, there needs to be consistent feedback throughout.”
For those looking into using Python for data science for the first time, Bruno recommends to develop a habit of daily practice. “Try a few tutorials on YouTube: learn how to set up, basic,” he says. “Find problems online and try to arrive at their solutions yourself. W3schools has a good Python tutorial, as does python.org.” If you know Python basics and are trying to learn data science, he recommends practicing with the Python Data Science Handbook and then going over to Kaggle, trying out some competitions, and looking at notebooks from the community for inspiration. There are numerous resources across the internet.
If you’re stuck, know there’s a community at the theClubhou.se that can help on the Slack channel #data-science and you can drop your questions off at our fledgling website: https://datasciencemeetup.com/. And join the meetup, the first and third Thursdays of every month!