This problem involves analyzing user activity logs stored in log.csv. Each line represents a single event with the format:
timestamp,user_id,event_type,event_parameter
- Definition: A session is a sequence of events by the same user
- Session End Rule: Session ends after 30+ minutes of inactivity
- Goal: Count sessions that started on
2020-04-19
- Video Event:
event_type = 2andevent_parameter = "video" - Goal: Find the day with the maximum number of unique users who watched videos
- Return: The count of unique users (not the date)
- Goal: Find the 5-minute interval
[time, time + 5 minutes)with the most events - Tie-breaker: If multiple intervals have the same count, return the latest one
- Return: Start time in format
YYYY-MM-DD_hh:mm:ss
- Group events by user_id and sort by timestamp
- For each user, identify session start points:
- First event starts a session
- Any event after a 30+ minute gap starts a new session
- Count sessions that started on the target date
- Filter events with
type=2andparameter="video" - Group by date and collect unique user_ids per day
- Return the maximum count of unique users
- Sort all timestamps chronologically
- Use sliding window approach to check each possible 5-minute interval
- Count events in each window and track the best (preferring later times for ties)
<session_count> <max_unique_users> <interval_start_time>
Example: 67890 111 2020-01-31_10:09:12
python solution.pyPrerequisites:
log.csvfile in the same directory- Python 3.6+ with
datetimemodule
- Data Analysis
- Log Processing
- Time Series Analysis
- Sliding Window Algorithm
- Session Management
- Time: O(n log n) due to sorting operations
- Space: O(n) for storing user activities and timestamps