[DeTomaso] any reason to keep old monthly newsletters and Profiles?
Jeff Detrich
jjdetrich at gmail.com
Tue Oct 9 13:13:59 EDT 2018
Originally, we were going to use tags, ie. metadata, to identify what was
in an article. Then those tags would be used to search the articles. Tags
actually work better than general google type searches. When you google,
any word that matches in an article would be listed as a match. So if you
were looking for manifold info using google almost every article would show
as a match.
To use tags, you might need 20 or more tags to describe what the article
was about. It could be general functional area, down to specifics like the
brake pads used or motor oil used. Also would include the year and model
of the car. Attached is a doc giving ideas. Getting to the articles would
use a Boolean search (comma to separate the key words) and then you'd get
back a list of the articles with matches.
The problem with this is that someone would have to go thru each of the
articles and input the words.
Jeff
On Mon, Oct 8, 2018 at 2:38 PM Julian Kift <julian_kift at hotmail.com> wrote:
> I wrote virtually the same thing to Terry earlier and forgot to copy
> the list;
>
> Agreed, but making PDF's fully searchable is not quite as
> straightforward, or it would have been done. I'm not sure whether
> all the archives are Word converted to PDF documents or scanned copies
> of originals, the latter makes it a step harder again and you need an
> OCR image search capability.
>
> The first step would probably be a fully searchable index, that at
> least identifies where the relevant search articles are located. I'm
> sure that part is not rocket science, but either way Terry is your man!
>
> I just checked and it looks like everything pre-2013 is scanned in,
> later is converted/saved from Word document.
>
> Julian
> __________________________________________________________________
>
> From: DeTomaso <detomaso-bounces at server.detomasolist.com> on behalf of
> Himes, Terry (397C) via DeTomaso <detomaso at server.detomasolist.com>
> Sent: Monday, October 8, 2018 12:32 PM
> To: Michael Cox; detomaso at server.detomasolist.com
> Subject: Re: [DeTomaso] any reason to keep old monthly newsletters and
> Profiles?
>
> Eeeech. I hope not. But, yeah, that would make double-trouble. I gotta
> get my hands on them
> to see fer sur.
> "A Purple Heart proves you were smart enough to hatch a plan,
> stupid enough to try it and lucky enough to survive!"
> Terry W. Himes
> JPL Jet Propulsion Laboratory
> Dawn Spacecraft Team
> Juno Systems & Software Team
> TGO Sequence Lead
> Phone: (818) 393-6261
> Cell: (818) 653-8213
> thimes at jpl.nasa.gov<[1]mailto:thimes at jpl.nasa.gov>
> From: Michael Cox <coxmichaelt at gmail.com>
> Date: Monday, October 8, 2018 at 12:20 PM
> To: Terry Himes <Terry.Himes at jpl.nasa.gov>,
> "detomaso at server.detomasolist.com" <detomaso at server.detomasolist.com>
> Subject: Re: [DeTomaso] any reason to keep old monthly newsletters and
> Profiles?
> Terry volunteered:
> > I probably can do that. I do it every day for Terabytes of telemetry
> data.
> > Right now I am converting 6TB of binary (channelized engineering
> telemetry) into
> > readable and searchable text data. The 6TB will balloon to over
> 60TB.
> >
> > If your data is text, or has any metadata, it would be much easier.
> MySQL would
> > be easy, but either ElasticSearch or DynamoDB (nosql) would be
> better, maybe.
> >
> > Of course, I?ve offered to help before?. only Asa Jay has taken me
> up on it.
> >
> > Terry
> Since the docs are scanned I would bet they are images compiled into a
> PDF. You'd
> have to crank up your fancy OCR software first. :^)
> --michael cox
>
> References
>
> 1. mailto:thimes at jpl.nasa.gov
> _______________________________________________
>
>
> Detomaso Email List is not managed by POCA
> Posted emails must not exceed 1.5 Megabytes
> DeTomaso mailing list
> DeTomaso at server.detomasolist.com
> http://server.detomasolist.com/mailman/listinfo/detomaso
>
> To manage your subscription (change email address, unsubscribe, etc.) use
> the links above.
>
> Members who post to this list grant license to the list to forward any
> message posted here to all past, current, or future members of the list.
> They also grant the list owner permission to maintain an archive or approve
> the archiving of list messages.
-------------- next part --------------
Originally, we were going to use tags, ie. metadata, to identify what
was in an article. Then those tags would be used to search the
articles. Tags actually work better than general google type searches.
When you google, any word that matches in an article would be listed as
a match. So if you were looking for manifold info using google almost
every article would show as a match.
To use tags, you might need 20 or more tags to describe what the
article was about. It could be general functional area, down to
specifics like the brake pads used or motor oil used. Also
wouldA A include the year and model of the car. Attached is a doc
giving ideas. Getting to the articles would use a Boolean search (comma
to separate the key words) and then you'd get back a list of the
articles with matches.
The problem with this is that someone would have to go thru each of the
articles and input the words.
Jeff
On Mon, Oct 8, 2018 at 2:38 PM Julian Kift <[1]julian_kift at hotmail.com>
wrote:
A A I wrote virtually the same thing to Terry earlier and forgot to
copy
A A the list;
A A Agreed, but making PDF's fully searchable is not quite as
A A straightforward, or it would have been done. I'm not sure
whether
A A all the archives are Word converted to PDF documents or scanned
copies
A A of originals, the latter makes it a step harder again and you
need an
A A OCR image search capability.
A A The first step would probably be a fully searchable index, that
at
A A least identifies where the relevant search articles are
located. I'm
A A sure that part is not rocket science, but either way Terry is
your man!
A A I just checked and it looks like everything pre-2013 is scanned
in,
A A later is converted/saved from Word document.
A A Julian
A A
A __________________________________________________________________
A A From: DeTomaso <[2]detomaso-bounces at server.detomasolist.com> on
behalf of
A A Himes, Terry (397C) via DeTomaso
<[3]detomaso at server.detomasolist.com>
A A Sent: Monday, October 8, 2018 12:32 PM
A A To: Michael Cox; [4]detomaso at server.detomasolist.com
A A Subject: Re: [DeTomaso] any reason to keep old monthly
newsletters and
A A Profiles?
A A Eeeech.A I hope not. But, yeah, that would make
double-trouble. I gotta
A A get my hands on them
A A to see fer sur.
A A "A Purple Heart proves you were smart enough to hatch a plan,
A A stupid enough to try it and lucky enough to survive!"
A A Terry W. Himes
A A JPL Jet Propulsion Laboratory
A A Dawn Spacecraft Team
A A Juno Systems & Software Team
A A TGO Sequence Lead
A A Phone: (818) 393-6261
A A Cell:A A A (818) 653-8213
A A [5]thimes at jpl.nasa.gov<[1]mailto:[6]thimes at jpl.nasa.gov>
A A From: Michael Cox <[7]coxmichaelt at gmail.com>
A A Date: Monday, October 8, 2018 at 12:20 PM
A A To: Terry Himes <[8]Terry.Himes at jpl.nasa.gov>,
A A "[9]detomaso at server.detomasolist.com"
<[10]detomaso at server.detomasolist.com>
A A Subject: Re: [DeTomaso] any reason to keep old monthly
newsletters and
A A Profiles?
A A Terry volunteered:
A A > I probably can do that. I do it every day for Terabytes of
telemetry
A A data.
A A >A Right now I am converting 6TB of binary (channelized
engineering
A A telemetry) into
A A >A readable and searchable text data.A The 6TB will balloon
to over
A A 60TB.
A A >
A A >A If your data is text, or has any metadata, it would be much
easier.
A A MySQL would
A A >A be easy, but either ElasticSearch or DynamoDB (nosql) would
be
A A better, maybe.
A A >
A A >A Of course, I?ve offered to help before?. only Asa Jay has
taken me
A A up on it.
A A >
A A >A Terry
A A Since the docs are scanned I would bet they are images compiled
into a
A A PDF. You'd
A A have to crank up your fancy OCR software first.A :^)
A A A --michael cox
References
A A 1. mailto:[11]thimes at jpl.nasa.gov
_______________________________________________
Detomaso Email List is not managed by POCA
Posted emails must not exceed 1.5 Megabytes
DeTomaso mailing list
[12]DeTomaso at server.detomasolist.com
[13]http://server.detomasolist.com/mailman/listinfo/detomaso
To manage your subscription (change email address, unsubscribe,
etc.) use the links above.
Members who post to this list grant license to the list to forward
any message posted here to all past, current, or future members of
the list. They also grant the list owner permission to maintain an
archive or approve the archiving of list messages.
References
1. mailto:julian_kift at hotmail.com
2. mailto:detomaso-bounces at server.detomasolist.com
3. mailto:detomaso at server.detomasolist.com
4. mailto:detomaso at server.detomasolist.com
5. mailto:thimes at jpl.nasa.gov
6. mailto:thimes at jpl.nasa.gov
7. mailto:coxmichaelt at gmail.com
8. mailto:Terry.Himes at jpl.nasa.gov
9. mailto:detomaso at server.detomasolist.com
10. mailto:detomaso at server.detomasolist.com
11. mailto:thimes at jpl.nasa.gov
12. mailto:DeTomaso at server.detomasolist.com
13. http://server.detomasolist.com/mailman/listinfo/detomaso
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 6TechAdmin - KeyWordsIssues.jpg
Type: image/jpeg
Size: 197009 bytes
Desc: not available
URL: <http://server.detomasolist.com/pipermail/detomaso/attachments/20181009/6840a75b/attachment.jpg>
More information about the DeTomaso
mailing list