Who I am: My name is Natasha and I am a senior at the Wharton School. While studying Business Analytics, I decided to apply for a job at Eventbrite because I think live events are great fun (like pop-up lectures and film screenings!), and I want to help people build meaningful connections.
Project goal: I decided to create and test my own product idea, which I call "Brite Box." As I demonstrate through my prototypes, the goal of "Brite Box" is to use machine learning to generate personalized event recommendations for Eventbrite's premium customers every week, saving them the hassle of searching through thousands of events in their area, or (even worse) missing out on the best local events that they really want to go to.
The "Brite Box" predicts and recommends the 5 top events for each attendee, as part of their monthly premium subscription package. These recommendations could be communicated via a weekly email, combined with a tailored user-experience (UX) on their Eventbrite homepage which have prototyped below. I use R and its visualization packages, as well as mockups in proto.io to test these concepts.
Current product: From the organic internet search for "Eventbrite" on Google, the user is directed to these two pages (see above and right). These require the user to fill in their location and eventually sort through the events in their area by category.
The results are based on the searches most relevant to "London", showing all different types of events across the city. The trouble is that there are simply thousands of events hosted on Eventbrite's platform in London at all times. The paradox of choice is overwhelming and may well impact conversion with ticket purchases in a negative way. Because of this, I generated some problem statements:
- How might we create a smarter, more personalized experience on Eventbrite's platform?
- How might we help each premium attendee find a couple events that they really want to go to, and create a sense of FOMO (fear of missing out)?
Measuring the concept's success: The solution to these problems should help Eventbrite better retain and regularly engage with its many attendees. For these attendees, right now Eventbrite might just be a platform that suggests a ton of other people's events. With "Brite Box", it could become a platform which supports each attendee to discover new things they love and encourage them to go do awesome things.
The "Brite Box" concept will be successful if it increases the number of tickets bought per attendee on the Eventbrite website. For example, the current conversation rate might be 1 event every 3 months per attendee, and we could try to increase this to 2 events per month per attendee.
Competing products: I like to go to concerts a couple times a year, and my experience with TicketMaster is very different to that of Eventbrite.
As you can see in these screenshots above, TicketMaster has engaged with me through a weekly email, and sent me updates on cool bands that were coming to Philly. Their emails eventually got a bit annoying, but their persistence paid off. Ticketmaster knew that I wanted to see Mac DeMarco before I did.
Who this project is for: To practice meeting the event invitation needs of premium Eventbrite attendees, my example of "Brite Box" is geared towards creating a platform for two very specific users: my parents, Karen and Peter Doherty. I start by using API data to aggregate events for them, as they just moved to a new neighborhood in West London and want to use Eventbrite to reimagine their lives there together.
Implications of my findings: I hope to be able to create 5 personalized event recommendations for my parents using the Eventbrite API data, to help me demonstrate my concept of "Brite Box".
User Research: "Empty-Nest-Explorers"
Peter and Karen Doherty are in their early 50s, and they just moved to Portobello Market, in Notting Hill, West London. Their youngest child is about to leave home and start university, so soon they will have a little more free time to go to events than they had before.
Though both of them work full time, they like to take the weekends off to go walking, as they love their dogs and the outdoors. Peter enjoys cycling and Karen is a frequent gym-goer.
During the holidays, they like to travel a lot (as you can see with Peter in Rwanda with some Gorillas below). They occasionally like to support the charity dinners of their friends, such as Art for Youth, an annual art exhibition in London selling affordable art to raise money for a UK youth charity.
Their house is situated near many bars and restaurants in the Portobello Market area of West London. Though Peter and Karen's friends live nearby, they do not yet have any direct neighbors that they go to social gatherings with.
MAIN TRENDS OF THE DATASET
Using the Eventbrite API website, I scraped the data of about 2500 events near my parents house. To see my data extraction and cleaning process, please see my appendix titled: "Data Extraction and Cleaning". This section explores some basic facts of the "fullevents" dataset that I am working with.
1. This histogram shows that most of the events I found were at the end of March and beginning of April, because I scraped the data in early March. This histogram is promising because it means that there are definitely enough events for us to make recommendations for Karen and Peter every week of the year.
2. The average capacity of events is about 120 people. The largest event had a capacity of 16 million (truncated in the histogram below), and the smallest event had a capacity of 38. There were 10 events where more than 50,000 people were invited, but most events had a capacity of less than 1500. We see from this distribution that hosts like to pick event capacities with round numbers, like 500 and 1000.
3. Using the Desc() function below, we can see that most events start in the evening, at 6pm, 6:30pm and 7pm, making up about 900 of the 2500 events I scraped. However, there are also a lot of morning events at 9am, 10am and 9:30am.
4. The most popular categories for events are 101 and 102 (disregarding the NA column). Categories 116, 118 and 120 are quite infrequent or unpopular. If only I had the datakey to know what these meant!
5. Only a tiny fraction (0.4%) of the events in my dataset have reserved seating.
RECOMMENDATIONS FOR MY PARENTS
I narrowed the 2500 events in my dataset down to 5 events that my parents would be most interested in going to. To see how I did this, please see my appendix below titled: "Data Extraction and Cleaning". Their recommendations for this week include two exclusive fine-dining experiences, a cycling event, a sponsored walk in Richmond park and a cognitive bias research event (Karen is pursuing a phd in Psychology).
One of the columns in this dataset was "Event Description". To make our machine learning algorithm better at finding events for Karen and Peter in the future, we might do a better job by searching for the words that appear larger in this word cloud below. For example, the words "food", "chef", "travel", "culinary" and "global" are probably more likely to occur in events that Karen and Peter like.
Next, I use proto.io to create an MVP of the web layout of "Brite Box" to see if my parents like this product and would be willing to pay for it. The screen below is the low-fidelity wireframe that will appear for each user.
Given that the Incubation Team likes this layout, we can then populate the prototype with recommendations for Karen and Peter to simulate the user experience with a higher level of fidelity. Here, I have populated the wireframe with the 5 personalized events that I retrieved from the API data.
From this, we can introduce my parents to the "Brite Box" product, test how much they would be willing to pay for it, and see how this changes their perception of Eventbrite as a company.
Companies often say that they want to be more data-driven in their decision making. While I think data is important for internal business decisions, data also offers us new ways to create exceptional user experiences, as I have explored with my product concept for "Brite Box".
"Brite Box" is a possible solution to increasing attendee engagement and creating a more tailored online experience for premium customers who are looking for Eventbrite events. This product is feasible for software engineers to build and will be successful if it increases the number of events each attendee goes to per month (hopefully this is a key performance indicator (KPI) that Eventbrite is already measuring).
Appendix: Data Extraction and Cleaning
Using the Eventbrite API website, I scraped the data of about 2500 events near my parents house, to gather geographically relevant events. I loaded the dataset using my API token from Eventbrite into R. I also load some of the basic packages that I hope to use. I convert the URL data from Eventbrite into a JSON file.
I want to retrieve the first 50 pages of information on events in Notting Hill to discover a variety of options for my parents. First, I check to see that the events are in the right location:
Success! I then read the pages into R, reading in a total of 50 pages (the code is truncated, and I smudged the URL to hide my token).
I bind all these URLs into a single "eventscrape" dataset for me to work with. From this basic JSON format, we see that this "eventscrape" dataset has nested dataframes within it. For example, "start" and "end" have nested dataframes, and the "description" column has both "html" and "text" within it.
I extract these columns with nested dataframes and clean them separately.
To make the dataframe workable, I also subset out many variables that I don't need, such as logo_id, and privacy_setting. I rename some of the cleaned columns before reattaching them to the dataset.
To make the Event Start Date and Event End Date columns workable, I loaded the data into OpenRefine and cleaned these columns. I then load the data into R again, and begin to subset out fun events that my parents would like, based on some preferences I gathered from them through an informal qualitative survey.
Here I filtered the data so that it only shows events which start and end on the same day. My parents do not like to stay out passed midnight! I then filter the events by those that contain the words "dog" or "workout", "art" and "charity", in order to find more specific recommendations based on interests I know they have.
First, I start by finding the right art event. Realistically, my mother does not want to go to free events, and wants an event where there is reserved seating. She also likes small events, so I limit the capacity of the event to less than 100 people.
I have found the perfect "art" related event for my parents: the Sheekey Guest Chef Series: Mark Sargeant.
Now I am looking for appropriate charity events. Again, I subset the data to those events which are not free and whose capacity is less than 100 people. I find that there are 3 such events that my parents might want to attend.
Finally, moving on to dog events, I limit the capacity to 50 people only. Interestingly, the event that comes up is not about dogs, but is another cooking-themed event that I believe my parents will also enjoy.
Findings: I combine my findings of the 3 types of events my parents would want to go to, and label this in a new dataset called "personalized events". There are 5 different events, which include a cycling trip, two special fine-dining experiences, an outdoor walk to raise money, and a cognitive biases lecture (Karen is getting a phd in psychology).