Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gather and store mailing list subscriber statistics #1570

Open
GregKaleka opened this issue Dec 23, 2024 · 5 comments
Open

Gather and store mailing list subscriber statistics #1570

GregKaleka opened this issue Dec 23, 2024 · 5 comments
Assignees

Comments

@GregKaleka
Copy link
Collaborator

In an effort to further measure engagement and activity in the C++ community, we should collect and store data on mailing list subscriber flows and levels. This should include total number of subscribers, new subscribes and unsubscribes by date. We will likely use this data in reports.

@GregKaleka GregKaleka self-assigned this Dec 23, 2024
@sdarwin
Copy link
Collaborator

sdarwin commented Dec 23, 2024

mm2 and mm3 are each separate tasks.

For mm2, I have installed this script on wowbagger:

https://github.com/cppalliance/boost-mailman/blob/master/scripts/mm2_stats.sh

list_members outputs a plain list of emails. Count that number each day. Store it in a file. It's at least a step towards a solution.

For mm3, a script should query the core db's address table and examine the registered_on date. (There is both a 'core' and a 'web' db on the same server.) Select a range of address for the time period since the last boost release based on registered_on and sum the amount.

For each of these methods, which are quite different, can you think of an algorithm to solve for "unsubscribes"? (I have an answer :-) )

PR's are welcome on https://github.com/cppalliance/boost-mailman

@GregKaleka
Copy link
Collaborator Author

For each of these methods, which are quite different, can you think of an algorithm to solve for "unsubscribes"? (I have an answer :-) )

Is your algorithm more sophisticated than looking for emails that disappear from the [list_members|address] from one day to the next?

@sdarwin
Copy link
Collaborator

sdarwin commented Dec 23, 2024

Possibly.

mm2:

Now that the log file has been discovered, parse that.

Before, I was thinking:

  • At the end of each computational run, replace/dump a file with the latest list of subscribers, such as "subscribers.txt".
  • At the start of the next day's cron, do a diff. Find the number that have disappeared from the diff. Those are unsubscribers in the last day.
  • Record both subscriber and unsubscriber values, along with the total, in the output file.

mm3:

Let's say a report is run every 3 months. Observe the total number of subscribers, compared to last period. It has increased by 10. Observe the monotonically increasing id field of the address table. That has increased by 15. That would mean 15 new subscribers, and 5 unsubscribers during the time period. so the trick I am suggesting is to check the id field of the db table as part of the calculation.

@sdarwin
Copy link
Collaborator

sdarwin commented Dec 23, 2024

Edit: updated text, above

@sdarwin
Copy link
Collaborator

sdarwin commented Dec 23, 2024

The verification step is a complexity though. Here is a snippet from the address db table:


 id |                                             email                                             | _original | display_name |        verified_on         |       registered_on        | user_id | preferences_id 
----+-----------------------------------------------------------------------------------------------+-----------+--------------+----------------------------+----------------------------+---------+----------------
  1 | [email protected]                                                                           |           |              | 2024-01-16 22:05:41.064171 | 2024-01-16 22:05:40.595222 |       1 |              1

In that case, maybe verified_on is the field to examine, instead of registered_on.

If someone "registers" but never completes "verify" then shouldn't they just be ignored? Not yet a real subscriber.

Then all calculations of any types (subscribers or unsubscribers) should exclude addresses with empty verified_on. That would affect the "monotonically increasing id field" idea, or it would need to be adjusted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants