VegasEducation
VegasEducation
  • Threads: 1
  • Posts: 42
Joined: Jan 13, 2024
Thanked by
Mental
June 28th, 2024 at 2:47:07 PM permalink
A package called "camelot." It basically does it in one line of code

For 2004 to 2018, i used pypdf. It just extracted everything as text and then i used a bunch of splitting the strings to format it how i wanted

And yes. This is basically how im learning to program. I have a problem and I figure out how to solve it. The code may not ever be the cleanest or the most efficient way of doing it but ive always been able to figure out what i need
Mental
Mental
  • Threads: 16
  • Posts: 1549
Joined: Dec 10, 2018
June 29th, 2024 at 5:30:39 AM permalink
Quote: VegasEducation

A package called "camelot." It basically does it in one line of code

For 2004 to 2018, i used pypdf. It just extracted everything as text and then i used a bunch of splitting the strings to format it how i wanted

And yes. This is basically how im learning to program. I have a problem and I figure out how to solve it. The code may not ever be the cleanest or the most efficient way of doing it but ive always been able to figure out what i need
link to original post

What I read about Camelot is that it is prone to errors if the formatting of the table changes in subtle ways. Camelot depends on the format of the table to figure out which text to extract. The data glitches were probably not your coding errors, but rather changes in the table format due to extraneous factors like the pandemic.

My text-based approach would also fail if the column layout of the table changes. I should probably use awk to make sure the extracted table data looks right.

I installed Camelot and will play around with it. I have to download many of my W2G forms in PDF format and I need a better way of turning them into CSV lists.
Last edited by: Mental on Jun 29, 2024
Gambling is a math contest where the score is tracked in dollars. Try not to get a negative score.
  • Jump to: